Poster Presentation The 48th Lorne Conference on Protein Structure and Function 2023

Structure-informed annotation of phage protein function with deep learning (#427)

Jiawei Wang 1 , Trevor Lithgow 2
  1. European Bioinformatics Institute (EMBL-EBI) and the University of Cambridge, Cambridge, United Kingdom
  2. Centre to Impact AMR, Monash University, Melbourne, Victoria, Australia

Antimicrobial resistance (AMR) continues to evolve as a major threat to human health and new strategies are required for the treatment of AMR infections. Bacteriophages (phages), viruses that kill bacterial pathogens, are being collected for use in phage therapies, with the intention to apply these bactericidal viruses directly into the infection sites in bespoke phage cocktails. Using such a biological agent for infection control requires deep understanding of the phage. Thus, and despite the great unsampled phage diversity for this purpose, a critical issue hampering the roll out of phage therapy is the poor-quality functional annotation of the majority of phages.

To this end, we have formulated a pipeline, including machine learning-based algorithms that capture structural information (“features”) and experimental validation, to predict key types of phage proteins [1-3]. Most recently, we developed Fungtion, based on protein language models to annotate phage proteins with over 15 core functions, from the prevailing capsid and tail proteins, to the rare but critical anti-CRISPR and depolymerase proteins [4]. Benefitting from the protein language models that learn patterns from millions of protein sequences across all life domains, Fungtion can capture the key characteristics to distinguish phage proteins with different functions. Having extensively validated Fungtion on various benchmarking tests and case studies, followed by currently ongoing wet-lab experimental validation, I would present and showcase how our work pinpoints proteins central to phage biology: to pave the way for downstream applications from gene editing technologies and protein engineering to phage therapy.

  1. Wang, J.*, Dai, W., Li, J., Li, Q., Xie, R., Zhang, Y., Stubenrauch C. & Lithgow, T.* AcrHub: an integrative hub for investigating, predicting and mapping anti-CRISPR proteins. Nucleic Acids Research 49.D1 (2021): D630-D638.
  2. Thung, T.Y.+, White, M.+, Dai, W.+, Wilksch, J.J., Bamert, R.S., Rocker, A., Stubenrauch, C., Williams, D., Huang, C., Schittenhelm, R., Barr, J.J., Jameson, E., McGowan, S., Zhang, Y., Wang, J.*, Dunstan, R.A.* & Lithgow, T.* The component parts of bacteriophage virions accurately defined by a machine-learning approach built on evolutionary features. mSystems 6.3 (2021): e00242-21.
  3. Wang, J., Dai, W., Li, J., Xie, R., Dunstan R. A., Stubenrauch C., Zhang, Y. & Lithgow, T.* PaCRISPR: A server for predicting and visualizing anti-CRISPR proteins. Nucleic Acids Research 48.W1 (2020): W348–W357.
  4. Yang, X., Li, J., Ren, J., Dai, W.+ Stubenrauch, C., Lithgow, T.*, & Wang, J.* Annotating phage protein functions with protein language models. To be published.