Congratulations to UCD School of Medicine’s Dr Olivier Dennler and Dr Colm Ryan on their recently published research in NAR Genomics and Bioinformatics, titled ‘Evaluating sequence and structural similarity metrics for predicting shared paralog functions’. The paper was selected as an "Editor's Choice" by the journal.
Their research found that Gene duplication can create new genes, often resulting in pairs of similar genes within a species, called paralogs. These paralogs can evolve over time but often retain related functions. Traditionally, scientists have used sequence identity — how similar two protein sequences are — to predict whether paralogs share functions. However, new AI-based methods like AlphaFold, which predicts protein structures, and protein language models (PLMs), which learn patterns in protein sequences, offer new ways to represent and thus compare protein sequences. But do metrics relying on these new representations work better? In this study, the authors tested these approaches in yeast and humans, using different ways to define shared function. They found that these new methods sometimes outperformed sequence identity, and that combining them with traditional sequence identity further improved predictions. Adding information about related proteins within and across species enhanced the predictions even more. Overall, these new similarity measures provide valuable insights beyond what sequence identity alone can offer.