Tuesday, 14 October 2025

New paper: Machine learning and statistical inference in microbial population genomics

We have published a new review article in Genome Biology contrasting machine learning and statistics in microbial genomics. This is joint work with Sam Sheppard, Nick Arning and David Eyre.

The availability of large genome datasets has changed the microbiology research landscape. Analyzing such data requires computationally demanding analyses, and new approaches have come from different data analysis philosophies. Machine learning and statistical inference have overlapping knowledge discovery aims and approaches.

In this review, we highlight how machine learning focuses on optimizing prediction, whereas statistical inference focuses on understanding the processes relating variables. We outline the different aims, assumptions, and resulting methodologies, with examples from microbial genomics. These approaches are essentially complementary, and we argue that exploiting both machine learning and statistics - selecting the right tool for the job - has the greatest potential for advancing pathogen research in the big data era.