Changes in protein sequence can have dramatic effects on how proteins fold, their stability, and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein thermodynamic stability, leveraging the growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. Many protein structures still remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with conventional homology models and the recent AlphaFold2 predicted structures. The contrasted performance with sequence-based tools was added, as a baseline.
Homology models were built using templates at a range of sequence identity levels (from 15% to 95%) to control input uncertainty for the structure-based tools, while AlphaFold2 models were generated mainly based on CASP14. Mutations were further divided based on structural features, such as solvent exposure and secondary structure type, and sequenced properties, such as mutation type and volume change. We found that there is indeed performance deterioration of the machine learning based predictors on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent-exposed residues and stabilised mutations. Predictive performance of the structural approaches using AlphaFold models shows high consistency of the experimental results. As structure prediction tools improve, the reliability of these predictors is expected to follow, but we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.