Poster Presentation The 48th Lorne Conference on Protein Structure and Function 2023

Deciphering the relationship between protein sequence and stability (#141)

Vladimir Morozov 1 , Carlos H.M. Rodrigues 1 , David B. Ascher 1
  1. School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia

The function of proteins is dependent upon its ability to fold into specific three-dimensional structures. While this is often considered the most stable thermodynamically, increasing experimental evidence has suggested that most proteins are metastable, readily unfolding when exposed to a range of stresses, including chemical compounds, radiation or heat. Protein stability is often a limiting factor in the repurposing of these molecular machines from drug design in the biopharmaceutical sector to the optimization of enzymes for biotechnological applications. Despite this, molecular bases of protein stability remain poorly elucidated, with extensive experimental optimisation often required. 

To facilitate rational protein engineering, a number of computational approaches have been developed to predict the change in Gibbs free energy (ΔΔG) between wild-type (ΔGWT) and mutant structures (ΔGMUT). These approaches have been limited to the number and type of mutations, generally rely on knowledge of the protein 3D structure and have provided limited biological insights into protein stability.

In this work we have asked a different question - what is the minimal information necessary to predict the thermodynamic stability of any given protein (ΔG)? 

Towards this, we have adapted a deep learning language model called BERT, which while originally designed to process natural languages, has been applied to understand “biological” language; where residues act as words and protein sequences act as sentences. This revealed that protein sequence alone provided significant insights into the overall stability of a protein, with predictive models achieving up to Pearson correlation of 0.5 on estimating protein stability. 

Interpreting this model in combination with most recent advances in protein structure prediction will hopefully provide further insights into the relationship between protein sequence, structure and function and opens new perspectives of large-scale analyses of protein stability, which is of considerable interest for protein engineering.