Abstract
Proteins are macromolecules that mediate a significant fraction of the cellular processes that underlie life. An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions. To this end, we introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches. The model is learned entirely from experimental data and conditions its generation on a compact specification of protein topology to produce a full-atom backbone configuration as well as sequence and side-chain predictions. We demonstrate the quality of the model via qualitative and quantitative analysis of its samples. Videos of sampling trajectories are available at https://nanand2.github.io/proteins .
Keywords
Related Publications
Protein-Protein Interfaces: Architectures and Interactions in Protein-Protein Interfaces and in Protein Cores. Their Similarities and Differences
Protein structures generally consist of favorable folding motifs formed by specific arrangements of secondary structure elements. Similar architectures can be adopted by differe...
Highly accurate protein structure prediction with AlphaFold
Abstract Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort ...
Histone acetylation and an epigenetic code
The enzyme-catalyzed acetylation of the N-terminal tail domains of core histones provides a rich potential source of epigenetic information. This may be used both to mediate tra...
pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination
Abstract Motivation: Generation of structural models and recognition of homologous relationships for unannotated protein sequences are fundamental problems in bioinformatics. Im...
Evolutionary-scale prediction of atomic-level protein structure with a language model
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full...
Publication Info
- Year
- 2022
- Type
- preprint
- Citations
- 99
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.48550/arxiv.2205.15019