Written by Rhea Rasquinha
Edited by Jacqueline Cho
AlphaFold solved over 2x more protein structures than previous models, establishing a new database for understanding unknown proteins and their functions.
Imagine a task that would take longer than the age of the known universe (nearly 14 billion years) to complete. Then, along comes something able to do that task in a few days. That’s exactly what AlphaFold, a new computer model published this past summer by researchers at DeepMind, can do.
Proteins, the “building blocks of life,” are made of various combinations and sequences of 20 different amino acids, and a protein’s function is determined by its 3D structure. However, computing a protein’s 3D structure is extremely difficult. In fact, Cyrus Levinthal, an American molecular biologist who introduced the protein folding (Levinthal’s) paradox, estimated in 1969 that it would take longer than the age of the known universe to calculate every possible structure and evaluate its feasibility) for the estimated 10300 possible structures for a single protein. AlphaFold is able to do so in a few days .
A key area of research for over 50 years involves the “protein folding problem”: predicting the 3D structure of a protein based solely on its amino acid sequence . Hand in hand with this is the more complex task of identifying which proteins interact with one another and how structures change during these interactions. AlphaFold allows scientists to more quickly and clearly visualize these interactions . The program predicts protein structures to near-experimental accuracy in most cases, making it the first computational model to do so .
As of July 2021, it was reported that the newest version of AlphaFold created 350,000 protein structure predictions, more than two times the number solved by any other experimental model. This represents almost 44% of all human proteins, and the group’s goal is to model all of the approximately 110 million currently catalogued proteins . Because of its speed in producing protein structure predictions, using the program for the entire human proteome is well within reach . AlphaFold represents an exciting step forward in the scientific discovery of molecules vital to life that are invisible to the eye.
At its core, AlphaFold is a neural network-based model that uses new machine learning algorithms to model proteins. Neural networks are made of a series of algorithms that work to analyze data in a way the human brain would . The program is built to combine what is currently known about the biology and physics of protein structure using multiple sequence alignments and pairwise features . Multiple sequence alignments (MSAs) are the alignment of 3+ protein sequences of similar length . These are used to infer homology (common ancestry) and evolutionary relationships between the sequences . Pairwise features, also known as pairwise sequence alignments, use tools to identify similar regions on protein sequences that may indicate functional, structural and/or evolutionary relationships . AlphaFold combines data from these two sets of sequences to create accurate end-to-end predictions of protein structure .
AlphaFold’s machine learning algorithm “trains” to predict protein structures using data from the Protein Data Bank (PDB), a collection of experimentally determined structures . What sets AlphaFold apart from other models is its ability to build components that learn from PDB data to then design structures that take into account this complex structural data . Structural accuracy is very important for molecules as intricate as proteins, and AlphaFold provides the framework to generate accurate models.
Beyond the intricacies of the mechanism behind AlphaFold, its public accessibility holds promise for several exciting applications. DeepMind has established a database of new protein predictions from AlphaFold available for free online, which can increase research pace and a better understanding of how unknown proteins function . AlphaFold has already facilitated the development of enzymes that break down plastics more quickly in the environment, which can potentially be used in drug development . Other environmental uses include the development of biofuels to replace fossil fuels . The program also allows scientists to engineer more efficient proteins; one way to do so is through altering binding affinity, which is how strongly a protein binds with other molecules .
From a health standpoint, AlphaFold can help predict how viruses and other disease-causing molecules or cells are impacted by existing medications [1, 6]. Scientists are then able to generate antibodies for specific targets or engineer other protein-based drugs, which is especially important for diseases with no pre-existing treatments or where conventional treatments are no longer effective [1, 6].
This program also has potential to make molecular replacement analyses and interpreting cryogenic microscopy maps much easier . Molecular replacement is the most common method used to create a map of electron density—essentially where electrons are located—which can then be used to model molecules . The program has also helped scientists who use X-ray crystallography and cryo-electron microscopy to better determine protein structures and fill in gaps caused by difficulties in interpreting data from these types of experiments . AlphaFold2 was used with cryogenic microscopy maps by a group from the University of California, San Francisco, to model a full length Nsp2 protein. This protein is a SARS-CoV-2 protein with suggested functions in many viral processes, so understanding its structure and function has important applications in teaching us more about the current pandemic .
Beyond analysis of currently existing proteins, AlphaFold can be used to model proteins from the past. Through evolutionary protein analysis, structures can be built for proteins with structures that are difficult to determine, such as proteins from ancient organisms . With a greater understanding of protein history and tools to create proteins for challenges of the future, scientists are better equipped to understand, create, and apply these complex structures.
Proteins are a unique category of molecules due to their variety of functions, and understanding these functions opens up countless possibilities of their uses. AlphaFold enables the accurate modelling of protein structures, which in turn allows analysis of molecule-specific functions and interactions. With applications in research, eco-friendly alternatives, healthcare, and more, AlphaFold is an important development in the growing field of using computing power to address real-world issues.
 Dhar P. AlphaFold proves that AI can crack fundamental scientific problems. IEEE Spectrum 2020 Dec 7. Available from: https://spectrum.ieee.org/alphafold-proves-that-ai-can-crack-fundamental-scientific-problems
 Service RF. New public database of AI-predicted protein structures could transform biology. Science 2021 Jul 22. Available from: https://www.science.org/news/2021/07/new-public-database-ai-predicted-protein-structures-could-transform-biology.
 Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021 Jul 15;596(1):583-89. Available from: https://www.nature.com/articles/s41586-021-03819-2. DOI: https://doi.org/10.1038/s41586-021-03819-2.
 Chen J. Neural network. Investopedia 2020 Dec 22. Available from: https://www.investopedia.com/terms/n/neuralnetwork.asp.
 Multiple sequence alignment. EMBL-EBI. Available from: https://www.ebi.ac.uk/Tools/msa/.
 Corvalan C. Why does predicting a protein’s 3D structure matter? The biological impact of AI system AlphaFold. PI IP Law 2020 Dec 10. Available from: https://piip.co.kr/en/blog/alphafold-protein-structure-prediction-importance-1.
 Millán C et al. Assessing the utility of CASP14 models for molecular replacement. Proteins 2021 Aug 12; online ahead of print. Available from: https://onlinelibrary.wiley.com/doi/10.1002/prot.26214. DOI: 10.1002/prot.26214.
 Gulta M et al. CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes. bioRxiv 2021 May 10; online ahead of print. Available from: https://www.biorxiv.org/content/10.1101/2021.05.10.443524v1. DOI: https://doi.org/10.1101/2021.05.10.443524.