Abstract

Proteins are macromolecules that mediate a significant fraction of the cellular processes that underlie life. An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions. To this end, we introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches. The model is learned entirely from experimental data and conditions its generation on a compact specification of protein topology to produce a full-atom backbone configuration as well as sequence and side-chain predictions. We demonstrate the quality of the model via qualitative and quantitative analysis of its samples. Videos of sampling trajectories are available at https://nanand2.github.io/proteins .

Keywords

Sequence (biology)Computer scienceProbabilistic logicGenerative modelGenerative grammarTask (project management)Biological systemTopology (electrical circuits)Computational biologyAlgorithmArtificial intelligenceBiologyMathematicsEngineeringGenetics

Related Publications

Publication Info

Year
2022
Type
preprint
Citations
99
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

99
OpenAlex

Cite This

Namrata Anand, Tudor Achim (2022). Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.2205.15019

Identifiers

DOI
10.48550/arxiv.2205.15019