Data-driven, evolution-based design of proteins

Overview

abstract

Project Summary: Evolution builds proteins with a remarkable combination of characteristics. They can fold spontaneously and carry out difficult chemical reactions, but also are robust to perturbation and able to adapt as conditions of fitness fluctuate. In recent years, sequence-based statistical models have provided specific models for how all these properties are encoded in the amino acid sequence of proteins. Here, we propose a data-driven, evolution-based design (EBD) process that, with the developments outlined here, can address several basic problems in protein mechanism and evolution. We will unify and optimize approaches for EBD and then apply it (1) to quantify the functional sequence space of a protein family, (2) to parse the constraints on paralogs and orthologs of a protein family, and (3) to understand how substrate specificity in an enzyme can adapt through a process of stepwise variation and selection. The work is extensively supported by preliminary data, and is enabled by new technologies for statistical inference, gene synthesis, and high-throughput functional assays, both in vitro and in vivo. The outcomes will be a unified computational framework for sequence-based statistical inference, and an serious test of the power of emerging evolution-based protein design approaches to understand and engineer protein molecules.

sponsor award id

R01GM141697

Biography

contributor

Principal Investigator

Time

start date

2021-08-01

end date

2025-05-31