In a study published in Nature Machine Intelligence, researchers from the University of Sheffield, in collaboration with AstraZeneca and the University of Southampton, have developed a machine learning framework that demonstrates improved accuracy in inverse protein folding compared to existing methods.
Inverse protein folding involves identifying amino acid sequences that form a desired 3D protein structure. The process is essential in protein engineering, particularly in drug development, where proteins must bind to specific biological targets. Due to the complexity of protein folding, predicting how amino acid sequences interact to form stable and functional structures remains a challenge.
Machine learning models trained on known protein sequences and structures have become critical tools in addressing this challenge. The new model, called MapDiff, was tested in simulated environments and showed improved prediction performance over current state-of-the-art artificial intelligence approaches.
Haiping Lu, Professor of Machine Learning at the University of Sheffield and corresponding author of the study, said, “This work represents a significant step forward in using AI to design proteins with desired structures. By learning how to generate amino acid sequences that are likely to fold into specific 3D structures, our method opens new possibilities for designing new therapeutic proteins, which can be used in various therapeutic applications. It’s exciting to see AI helping us tackle such a fundamental challenge in biology.”
Peizhen Bai, Senior Machine Learning Scientist at AstraZeneca, developed MapDiff during his PhD at the University of Sheffield’s School of Computer Science. He said, “During my PhD, I was motivated by the potential of AI to accelerate biological discovery. I’m proud that our method, MapDiff, helps design protein sequences that are more likely to fold into desired 3D structures — a key step towards advancing next-generation therapeutics.”
The study builds on prior work between the University of Sheffield and AstraZeneca, including the development of DrugBAN, an AI model that predicts drug-target binding. That research also appeared in Nature Machine Intelligence and became one of its most cited papers in 2023.
The latest paper is titled Mask-prior-guided denoising diffusion improves inverse protein folding and is now available in Nature Machine Intelligence.