Protein folding solved by AI: A breakthrough for biomedicine

EUGE...QVav
10 Jul 2023
189

Protein folding is one of the most fundamental problems in biology. It is also one of the most challenging ones to solve. But thanks to a breakthrough by DeepMind, a leading artificial intelligence company, we may be closer than ever to cracking this puzzle and unlocking new possibilities for biomedicine.

Image Source: Technologyreview


In this article, we will explore what protein folding is and how it started, how DeepMind’s AI system works and what makes it different, why it matters and what are its implications, what is its future and what are its prospects, what are the challenges it faces and how it addresses them, and a conclusion.

What is protein folding and how it started


Proteins are large molecules that perform various functions in living organisms, such as catalyzing chemical reactions, transporting substances, fighting infections, and regulating gene expression. Proteins are made of smaller units called amino acids, which are linked together in a long chain. The order of amino acids in a protein is determined by the genetic code in DNA.

However, the function of a protein does not depend only on its sequence of amino acids, but also on its three-dimensional shape. This shape is determined by how the protein chain folds itself into a complex structure that can interact with other molecules. The process of protein folding is driven by physical forces and chemical interactions among the amino acids and the surrounding environment.

Protein folding is essential for life, but it is also extremely difficult to predict or control. Even though we know the sequence of amino acids for many proteins, we do not know how they fold into their functional shapes. This is because there are many possible ways that a protein chain can fold, and finding the most stable one among them is a daunting computational task. For example, a protein with 100 amino acids can have 10^143 possible configurations, which is more than the number of atoms in the universe!

The problem of protein folding has fascinated scientists for decades. In 1972, Christian Anfinsen won the Nobel Prize in Chemistry for showing that the sequence of amino acids determines the shape of a protein. In 1994, David Baker and colleagues launched CASP (Critical Assessment of protein Structure Prediction), a biennial competition that challenges researchers to predict the structures of proteins whose shapes are unknown. In 2005, Baker and colleagues also created Foldit, an online game that allows anyone to participate in protein folding by using their intuition and creativity.

How DeepMind’s AI system works and what makes it different


DeepMind is a company that specializes in developing artificial intelligence systems that can learn from data and solve complex problems. It is best known for creating AlphaGo, an AI program that defeated the world champion of Go, a board game that requires strategic thinking and intuition.

In 2018, DeepMind entered CASP for the first time with AlphaFold, an AI system that uses deep learning to predict protein structures. Deep learning is a branch of machine learning that uses artificial neural networks to learn from large amounts of data and extract patterns and features from them. AlphaFold was trained on thousands of known protein structures from public databases, as well as other sources of information such as evolutionary relationships among proteins.

AlphaFold performed well in CASP13, achieving an average accuracy score of 58.9 out of 100 across 43 proteins. This score measures how close the predicted structure is to the actual one determined by experimental methods such as X-ray crystallography or nuclear magnetic resonance spectroscopy. A score above 90 is considered to be equivalent to the experimental accuracy.

However, DeepMind did not stop there. In 2020, they entered CASP14 with AlphaFold2, an improved version of their AI system that uses a novel approach called attention-based neural networks. Attention-based neural networks are able to focus on specific parts of the input data that are relevant for the output task, such as predicting the distance or angle between two amino acids in a protein chain.

AlphaFold2 achieved remarkable results in CASP14, reaching an average accuracy score of 92.4 across 100 proteins. It also solved the structures of several proteins that have eluded scientists for years, such as membrane proteins and viral proteins. AlphaFold2 was able to predict protein structures in a matter of hours or days, compared to the months or years that it takes for experimental methods.

AlphaFold2 is different from previous methods of protein structure prediction in several ways. First, it does not rely on templates or homology, which are methods that use known structures of similar proteins as a starting point or a guide. Instead, it predicts the structure directly from the sequence of amino acids, using only physical and biological principles. Second, it does not use any human input or intervention, such as selecting features or tuning parameters. Instead, it learns everything from the data, using a self-correcting feedback loop. Third, it does not produce a single structure, but a distribution of possible structures, along with their probabilities and uncertainties. This allows for a more realistic and robust representation of the protein’s shape and dynamics.

Why it matters and what are its implications


The ability to predict protein structures accurately and efficiently has enormous implications for biomedicine and beyond. Proteins are involved in almost every aspect of life, from metabolism to immunity to cognition. Understanding their shapes and functions can help us discover new drugs, design better vaccines, diagnose diseases, engineer enzymes, create biomaterials, and more.

For example, knowing the structure of a protein can help us identify its active sites, which are regions that bind to other molecules and mediate biological reactions. By designing molecules that can interact with these sites, we can modulate the activity of the protein and influence its function. This is the basis of drug discovery and development, which aims to find molecules that can treat or prevent diseases by targeting specific proteins.

Another example is knowing the structure of a protein can help us understand how it interacts with other proteins or molecules in a complex system, such as a cell or an organism. By mapping these interactions, we can reveal the molecular mechanisms that underlie biological processes and pathways. This is the basis of systems biology and network medicine, which aim to understand how the components of a system work together and how they are affected by perturbations.

A third example is knowing the structure of a protein can help us design new proteins with desired properties or functions. By manipulating the sequence of amino acids or introducing mutations, we can alter the shape and behavior of the protein and create novel variants. This is the basis of protein engineering and synthetic biology, which aim to create new biological systems or devices with specific applications.

What is its future and what are its prospects


The success of AlphaFold2 in CASP14 has been hailed as a milestone in computational biology and artificial intelligence. It has also sparked a wave of excitement and curiosity among scientists and the public alike. Many researchers have expressed their interest in using AlphaFold2 to solve their own protein structure problems or collaborate with DeepMind on new projects.

DeepMind has announced that they will release the code and data of AlphaFold2 to the scientific community in early 2021. They have also partnered with EMBL’s European Bioinformatics Institute (EMBL-EBI) to create a public database of protein structure predictions for all known sequences. This database will contain more than 100 million structures, covering almost every protein in nature. It will be freely accessible to anyone who wants to explore or use this information for research or education.

DeepMind has also stated that they will continue to improve AlphaFold2 and apply it to new challenges and domains. For instance, they plan to extend their system to predict protein-protein interactions, which are essential for understanding how proteins work together in complex systems. They also plan to explore other types of biological molecules, such as RNA and DNA, which have their own folding problems and functions.

The future of AlphaFold2 and protein folding is bright and promising. With this breakthrough technology, we may be able to unravel some of the mysteries of life and discover new ways to improve health and well-being.

What are the challenges it faces and how it addresses them


Despite its impressive achievements, AlphaFold2 is not perfect or complete. It still faces some challenges and limitations that need to be addressed or overcome.

One challenge is validating and verifying the accuracy and reliability of AlphaFold2’s predictions. Although AlphaFold2 has shown remarkable agreement with experimental data in CASP14, there may be cases where its predictions are incorrect or uncertain. For example, some proteins may have multiple stable shapes or undergo conformational changes under different conditions.

Some proteins may also have post-translational modifications or interactions with other molecules that affect their structures. Therefore, it is important to compare AlphaFold2’s predictions with independent experimental methods or other computational methods to confirm their validity and quality.

Another challenge is understanding and explaining how AlphaFold2 works and why it makes certain predictions. Although AlphaFold2 is based on physical and biological principles, it is not transparent or interpretable in its internal logic or reasoning. It is a complex and sophisticated system that uses deep learning and attention mechanisms to learn from data and generate outputs.

It is not easy to understand how it processes the input sequence, how it represents the features and patterns, how it weighs the evidence and probabilities, and how it produces the output structure. Therefore, it is important to develop methods and tools to analyze and visualize AlphaFold2’s behavior and decisions, and to provide explanations and feedback to the users.

A third challenge is generalizing and transferring AlphaFold2’s capabilities to other domains and tasks. Although AlphaFold2 has demonstrated its ability to predict protein structures across different families and categories, there may be domains or tasks where it does not perform well or fails. For example, some proteins may have unusual or novel sequences or structures that are not well represented in the training data or the existing databases.

Some proteins may also have functions or interactions that are not captured by the structure alone, but require additional information or context. Therefore, it is important to test and evaluate AlphaFold2’s performance on different domains and tasks, and to adapt and improve its system to handle new challenges and scenarios.

Conclusion


Protein folding is a fundamental problem in biology that has been solved by artificial intelligence. DeepMind’s AlphaFold2 is a breakthrough system that can predict protein structures accurately and efficiently from the sequence of amino acids alone. It has achieved remarkable results in CASP14, a biennial competition that challenges researchers to solve this problem.

It has also opened new possibilities for biomedicine and beyond, as protein structures are key to understanding and influencing biological functions and processes.
AlphaFold2 is not only a scientific achievement, but also a technological marvel. It is a testament to the power and potential of artificial intelligence to solve complex problems and advance human knowledge. It is also a demonstration of the collaboration and innovation that can happen when different disciplines and fields come together and share their expertise and resources.

AlphaFold2 is not the end of the story, but the beginning of a new chapter. It still has room for improvement and expansion, as it faces some challenges and limitations that need to be addressed or overcome. It also has many opportunities and prospects, as it can be applied to new domains and tasks that can benefit from its capabilities.

I hope you enjoyed this article and learned something new about protein folding and artificial intelligence. If you did, please like, comment, share, or subscribe. I would love to hear your feedback and suggestions for future topics. Thank you for reading!

References



Also check out some of the other interesting articles that I have written!!!


Write & Read to Earn with BULB

Learn More

Enjoy this blog? Subscribe to TheCuriousSam

5 Comments

B
No comments yet.
Most relevant comments are displayed, so some may have been filtered out.