Antimicrobial resistance (AMR) continues to evolve as a major threat to human health and new strategies are required for the treatment of AMR infections. Bacteriophages (phages), viruses that kill bacterial pathogens, are being collected for use in phage therapies, with the intention to apply these bactericidal viruses directly into the infection sites in bespoke phage cocktails. Using such a biological agent for infection control requires deep understanding of the phage. Thus, and despite the great unsampled phage diversity for this purpose, a critical issue hampering the roll out of phage therapy is the poor-quality functional annotation of the majority of phages.
To this end, we have formulated a pipeline, including machine learning-based algorithms that capture structural information (“features”) and experimental validation, to predict key types of phage proteins [1-3]. Most recently, we developed Fungtion, based on protein language models to annotate phage proteins with over 15 core functions, from the prevailing capsid and tail proteins, to the rare but critical anti-CRISPR and depolymerase proteins [4]. Benefitting from the protein language models that learn patterns from millions of protein sequences across all life domains, Fungtion can capture the key characteristics to distinguish phage proteins with different functions. Having extensively validated Fungtion on various benchmarking tests and case studies, followed by currently ongoing wet-lab experimental validation, I would present and showcase how our work pinpoints proteins central to phage biology: to pave the way for downstream applications from gene editing technologies and protein engineering to phage therapy.