Reference-quality Drosophila genome assemblies for evolutionary analysis of previously inaccessible genomic regions
Project Summary Several compelling questions in genome sequence analysis have been compromised by errors and gaps in the available genome assemblies. A telomere-to-telomere platinum-quality genome sequence of a human would open doors to investigating many problems associated with genetic disease, and development of platinum-quality model organism genomes will allow early exploration of the most efficient ways to pursue these questions. To demonstrate the utility of multiple reference-quality genomes, we have formulated several questions about genome evolution that make use of the Drosophila model system. These questions include (1) identifying new genes that originated within the Drosophila-specific clade, (2) estimating the rates of new gene evolution and examine the variation and constancy of those rates among Drosophila lineages, (3) quantifying rates and patterns of divergence of piRNA clusters, critical to host regulation of transposable elements, (4) analysis of sequence divergence in heterochromatic repeats, known to play key roles in centromere and telomere function as well as modulating chromatin states, and (5) analysis of Y chromosome gene and loss across the pan-Y chromosome. By obtaining and annotating reference-quality genome sequences of 19 Drosophila species spanning 40-60 MY of evolutionary history, using an efficient scheme that combines deep long-read (PacBio) assembly coupled with targeted sequencing of bacterial artificial chromosomes, we will produce a resource that will pave the way for the Drosophila community to tackle pressing hypothesis-driven questions in the field, including embryonic development, neurobiology, and aging ? all within a phylogenomics perspective.