Express Pharma

University of Glasgow Scientists harness AI to decode the language of proteins

The study, ‘PLM-interact: extending protein language models to predict protein-protein interactions’ is published in Nature Communications.

0 112

Scientists at the University of Glasgow have developed a pioneering artificial intelligence model capable of translating the complex ‘language’ of proteins.

In a new study, published in Nature Communications, the cross-disciplinary team developed  a large language model (LLM), called PLM-Interact, to better understand protein interactions,  and even predict which mutations will impact how these crucial molecules ‘talk’ to one  another.

Early tests of PLM-interact, a protein language model (PLM), show that it outperforms  competing models in understanding and predicting how proteins interact with one another.  The team’s research demonstrates PLM-interact could help us better understand key areas  of medical science, including the development of diseases such as cancer and virus  infection.

The research team – led by Dr Ke Yuan from the University’s School of Cancer Sciences and  the Cancer Research UK Scotland Institute, Prof Craig Macdonald from the School of  Computing Science and Prof David L Robertson, from the MRC-University of Glasgow  Centre for Virus Research (CVR) – are developing these types of AI model to add much needed detail on how diseases arise.

PLM-interact could also provide new insight into how viruses interact with their host species.  In the future, it is possible this approach could even be used to predict a virus’s pandemic  potential and identify new drug targets.

Proteins are the main structural components of all cells and viruses and play a key role in  biological processes by interacting with other proteins. Disruption of these protein-to-protein  interactions (PPIs) is often linked with disease formation, including cancers and genetic  diseases. Additionally, protein-to-protein interactions play an important role in viral infections,  with viruses relying on the proteins in our cells to help them replicate and continue the  infection process.

A better understanding of protein interactions would offer scientists vital new insights into  disease and infections, potentially paving the way for the development of new therapies or  vaccines. However, currently identifying protein-to-protein interactions experimentally can be  both costly and time-consuming, and new ways to speed up the learning process are  required.

PLM-interact was first ‘trained’ on more than 421,000 human protein pairs and their  interactions with data-processing support from the UK’s DiRAC High Performance Super  Computer facility. Specifically, Tursa, originally developed to help theoretical physicists  simulate aspects of the workings of the universe, provided the team with access to a highly

optimised GPU cluster that helped them more quickly build and fine-tune the model, which  involves more than 650 million individual parameters.

Dr Ke Yuan, one of the paper’s corresponding authors, said: “It’s great to think that DiRAC,  which was developed to help scientists understand the laws of nature from the smallest  subatomic particles to the largest scales in the Universe, has helped us build this new model  to explore the inner space of protein interactions instead.

“Colleagues from our School of Computing Science provided support with the language  modelling aspects of creating PLM-interact, but in order to train the model itself, we needed  access to vast amounts of computing power. Working with DiRAC to tap into their GPU  computing resources, as well as their training, technical support and software engineering  resources, helped us do that much more quickly and effectively.”

PLM-interact can predict protein interactions with between 16% and 28% more accuracy  than other state-of-the art AI protein models. In addition, PLM-interact was able to accurately  predict five key protein interactions that govern essential biological functions including RNA  polymerisation and protein transportation. Notably other protein AI tools, including the  Google DeepMind-powered AlphaFold3, were only able to predict one of the five protein-to protein interactions.

Researchers were also able to show that PLM-interact could accurately identify the impact of  mutations on protein interactions, both for mutations that cause negative consequences  (including genetic diseases) and for mutations that inhibit essential protein-protein  interactions, causing diseases such as cancers.

The research team also trained PLM-interact with a further 22,383 protein-to-protein  interactions, this time from 5,882 human and 996 virus proteins. Once again PLM-interact  outperformed existing protein models in its ability to predict how human and virus proteins  interacted, demonstrating the model’s power as an accurate virus prediction tool.

Prof David L Robertson, head of CVR Bioinformatics at the University of Glasgow, is the  paper’s other corresponding author. He said: “The urgency to understand virus-host  interactions during COVID-19 pandemic is a good illustration of why a tool like PLM-interact  could be invaluable in the future. Being able to quickly and accurately gain insight into how  viruses interact with our proteins could help us better understand virus emergence and  disease risks, which in turn can help speed up the development of new treatments and  therapies.

“Our results are a very promising contribution to developing a system capable of predicting  protein interactions at an unprecedented scale and level of accuracy. This is Dan Liu’s, the  paper’s first author, PhD work and is a remarkably strong platform to build on for the future.

We’re already looking at expanding our team to help us explore the full potential of PLM interact for a wide range of applications in the future.”

The study, ‘PLM-interact: extending protein language models to predict protein-protein  interactions’ is published in Nature Communications. The work was funded by European  Union’s Horizon 2020 research and innovation 562 program, the Medical Research Council  with support from Cancer Research UK, Prostate Cancer UK and the Biotechnology and  Biological Sciences Research Council.

 

Leave A Reply

Your email address will not be published.