The University of Chicago Header Logo

COLLABORATIVE RESEARCH: ADVANCED STATISTICAL METHODS FOR SINGLE CELL RNA SEQUENCING STUDIES


Collapse Overview 
Collapse abstract
Single cell RNA sequencing has emerged as a powerful tool in genomics and has been used in a wide variety of applications, providing unprecedented insights into many basic biological questions that are previously difficult to address. However, analyzing scRNAseq data face important statistical and computational challenges that require the development of new computational and statistical methods. Key challenges include: (1) lack of robust statistical methods that can control for hidden confounding effects in a range of settings; (2) lack of accurate cell subpopulation clustering methods that are tailored to scRNAseq studies; and (3) difficulty in identifying functional genetic variations with scRNAseq alone and difficulty in integrating scRNAseq with other genetic studies include genome-wide association studies. Our proposed methods will address these challenges and are innovative in the following aspects: (1) our method for controlling for hidden confounding effects bridges between two existing classes of statistical methods for removing confounding effects and is thus expected to perform robustly across a range of scenarios; (2) our method for clustering cell subpopulations extracts clustering information from a lowdimensional representation of scRNAseq data and is thus expected to produce accurate results even when the original high-dimensional gene expression matrix is noisy; and (3) our method for identifying allele specific/biased expression using scRNAseq data alone represents the first such attempt and our method for integrating scRNAseq with GWASs also represents the first such attempt. All our proposed methods are tailored to scRNAseq data and will cope with the complexities and unique features of scRNAseq data, including, but not limited to, low-coverage, count nature, and drop-out events. We will develop, distribute, and support user-friendly open-source software implementing our methods to benefit the genomics and statistics community. The statistical methods developed here will pave ways for developing similar methods to other sequencing studies including bisulfite sequencing and ATAC-seq studies. The proposed methods are essential for understanding the heterogeneity of tissue compositions and the genetic architecture of complex traits and diseases - both are questions of central importance to human health.
Collapse sponsor award id
1R01GM126553-01

Collapse Biography 

Collapse Time 
Collapse start date
2017-08-01
Collapse end date
2022-05-31