Decoding the regulatory architecture of the human genome across cell types, individuals and disease

Overview

abstract

PROJECT DESCRIPTION While accurate annotations of protein-coding regions in the human genome have been available for many years, annotation and interpretation of regulatory sequences has lagged far behind. This is because?in contrast to protein-coding sequences?the ?rules? that govern links from genome sequence to regulatory function are fuzzy, complex, and highly context-specific. Our limited understanding of regulatory regions presents a fundamental challenge for the identification and interpretation of disease variation, especially in the context of personal genome interpretation. Work from ENCODE and other groups has started to close this gap through experimental work, including high-resolution maps of regulatory sites in a variety of cell types, and modeling of the cell-type specific mappings from genome sequence to regulatory function. In this project we will develop a suite of new tools that uses these diverse new data sets to tackle these problems. We will implement and apply powerful new machine learning methods (based on deep learning) to interpret the genomic, context-specific encoding of regulatory information, and to identify genetic variants that impact the encoded information. We will build models using data from a variety of sources including ENCODE, Roadmap Epigenomics, GTEx, regulatory variation in the HapMap cell lines, as well as from disease cohorts. Validation experiments will be performed using a new high-complexity CRISPR/Cas9 system developed by our team. We will develop software tools and analytical results that can be widely used for genome interpretation, especially in analysis of personal genomes. By the end of this study we expect to have: (1) developed powerful new computational models for predicting regulatory function in a wide variety of cell types, at unprecedented resolution; (2) implemented novel validation screens in native chromatin at extremely high throughput; and (3) developed new tools for interpreting common and rare regulatory variation, with particular focus on identification of high-impact regulatory mutations in personal genomes. We are committed to timely release of software, data and analysis and are committed to working with the ENCODE Consortium to increase the impact of data and analyses from all study sites.

sponsor award id

U01HG009431

Biography

contributor

Principal Investigator

Time

start date

2017-02-01

end date

2022-01-31