Friday 20 March 2020

New paper: GenomegaMap for dN/dS in over 10,000 genomes

Published this week in Molecular Biology and Evolution, is a new paper joint with the CRyPTIC Consortium "GenomegaMap: within-species genome-wide dN/dS estimation from over 10,000 genomes".

The dN/dS ratio is a popular statistic in evolutionary genetics that quantifies the relative rates of protein-altering and non-protein-altering mutations. The rate is adjusted so that under neutral evolution - i.e. when the survival and reproductive advantage of all variants is the same - it equals 1. Typically, dN/dS is observed to be less than 1 meaning that new mutations tend to be disfavoured, implying they are harmful to survival or reproduction. Occasionally, dN/dS is observed to be greater than 1 meaning that new mutations are favoured, implying they provide some survival or reproductive advantage. The aim of estimating dN/dS is usually to identify mutations that provide an advantage.

Theoreticians are often critical of dN/dS because it is more of a descriptive statistic than a process-driven model of evolution. This overlooks the problem that currently available models make simplifying assumptions such as minimal interference between adjacent mutations within genes. These assumptions are not obviously appropriate in many species, including infectious micro-organisms, that exchange genetic material infrequently.

There are many methods for measuring dN/dS. This new paper overcomes two common problems:
  • It is fast no matter how many genomes are analysed together.
  • It is robust whether there is frequent genetic exchange (which causes phylogenetic methods to report spurious signals of advantageous mutation) or infrequent genetic exchange.
The paper includes detailed simulations that establish the validity of the approach, and it goes on to demonstrate how genomegaMap can detect advantageous mutations in 10,209 genomes of Mycobacterium tuberculosis, the bacterium that causes tuberculosis. The method reproduces known signals of advantageous mutations that make the bacteria resistant to antibiotics, and it discovers a new signal of advantageous mutations in a cold-shock protein called deaD or csdA.

Software that implements genomegaMap is available on Docker Hub and the source code and documentation are available on Git Hub.

With the steady rise of more and more genome sequences, the analysis of data becomes an increasing challenge even with modern computers, so it is hoped that this new method provides a useful way to exploit the opportunities in such large datasets to gain new insights into evolution.

Monday 16 March 2020

Postdoc Available in Statistical Genetics

The closing date for applications for this post is noon on Wednesday 15th April 2020.

We are seeking an exceptional researcher with a track record in methods development for Statistical Genomics and an interest in Infectious Disease to join our group at the Big Data Institute. Our research focuses on Bacterial Genomics, Genome-Wide Association Studies and Population Genetics. The aim of the post is to conduct innovative research within the group's range of interests and to make use of the opportunities afforded by our outstanding collaborators. We welcome candidates who wish to use the opportunity as a stepping stone to independent funding.

The Oxford University Big Data Institute (BDI) is an interdisciplinary research centre aiming to develop, evaluate and deploy efficient methods for acquiring and analysing biomedical data at scale and for exploiting the opportunities arising from such studies. The Nuffield Department of Population Health, a partner in the BDI, contains world-renowned population health research groups and is an excellent environment for multi-disciplinary teaching and research.

The Postdoctoral Researcher in Statistical Genomics will join our team which has expertise in microbiology, genomics, evolution, population genetics and statistical inference. Responsibilities include planning a research project and milestones with help and guidance from the group, preparing manuscripts for publication, keeping records of results and methods and tracking milestones, and disseminating results.

To be considered, you need to hold, or be close to completion of, a PhD/DPhil involving statistical methods development. You also need experience of large-scale statistical data analysis, evidence of originating and executing your own academic research ideas and excellent interpersonal skills and the ability to work closely with others in a team.

For informal enquiries, please contact me.

Further details, including how to apply are here: