Research Spotlight: AutoPeptideML: Enhancing the reliability of peptide bioactivity prediction models

 

New research from the Shields group is making sophisticated machine learning tools more accessible to a wider audience, particularly those researchers interested in discovering new bioactive peptides with potential therapeutic applications.

Peptides are simple, short chains of amino acids that can often serve as signalling molecules (e.g., hormones like insulin), antimicrobial agents, or building blocks of proteins.

The way peptides interact with living systems to influence biological functions is referred to as peptide bioactivity. This may have potential uses in healthcare, agriculture, and research.

Raul Fernández-Díaz is an IBM Research employee who is currently pursuing his PhD at UCD Conway Institute working within the group led by Professor Denis Shields. 

Raul has developed a novel system for predicting peptide bioactivity aimed at democratising the use of AI and machine learning. His system provides an accessible platform to easily create a predictive model that can forecast the bioactivity of peptides whose functions are unknown. 

Two men smiling

Professor Denis Shields (left) and Raul Fernández-Díaz, Phd student.

Using a user-friendly server website, researchers or practitioners can input their known peptides and automatically generate a machine learning model, which can then be used to predict the biological activities of other, uncharacterised peptides.

Fernández-Díaz's approach integrates state-of-the-art protein language models. These are advanced machine learning models specifically designed to handle and understand protein sequences - like how natural language processing models work with human language. 

These protein language models allow the system to capture the complex relationships and patterns within peptide sequences that connect to biological functions. 

"One of the key features of AutoPeptideML ensures that the model can distinguish between active and inactive peptides more accurately. This is done through proposing robust strategies for defining 'negative controls' - peptides with no bioactivity”, explains Raul Fernández-Díaz.

His system includes a novel approach to test the model’s performance by intentionally creating a test dataset that is more distantly related to the training set. 

"This simulates real-world scenarios where researchers are often working with peptides that are structurally or functionally different from known bioactive peptides”, Raul added.

This strategy helps to provide more reliable and realistic estimates of the model’s predictive power, ensuring that the tool remains effective even when applied to peptides with limited similarity to those seen during training.

Professor Denis Shields said, “Raul’s work presents a significant advancement in peptide bioactivity prediction using accessible, sophisticated machine learning tools. The platform will enhance the accuracy and reliability of predictions and expand the possibilities for discovering new potential therapeutic biologics”.

Journal Citation
AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors. 
Raúl Fernández-Díaz, Rodrigo Cossio-Pérez, Clement Agoni, Hoang Thanh Lam, Vanessa Lopez, Denis C Shields. Bioinformatics, Volume 40, Issue 9, September 2024. doi: 10.1093/bioinformatics/btae555