## Tuesday, 7 December 2021

### Two new positions: Senior Statistical Geneticist and Bioinformatician

Two new positions are available in my Infectious Disease Genomics group at the Big Data Institute, University of Oxford.

A Senior Postdoctoral Statistical Geneticist to jointly lead the implementation, design and application of new statistical tools for genome-wide association studies, lead the biological interpretation of key findings, develop methodologies and supervise junior group members. This post would suit a candidate with a PhD and relevant post-doctoral experience including direct experience in statistical genetics. Candidates without post-doctoral experience may be considered for a less senior appointment.

A Bioinformatician to provide expertise for computationally intensive analyses including genome-wide association studies and RNAseq studies of differential gene expression, as well as contributing to informatics projects as part of a wider collaboration with national biomedical cohorts. This post would suit a candidate with either a post-graduate degree related to Bioinformatics, Statistics, and Computing or equivalent experience in industry.

The application deadline for both posts is Noon GMT on Friday 7th January 2022.

### New paper: Machine learning to predict the source of campylobacteriosis using whole genome data

This study, published in October in PLOS Genetics, brings together machine learning, large bacterial isolate collections and whole genome sequencing to address the general problem of how to trace the source of human infections.

Specifically, we investigated campylobacteriosis, a common infection of animal origin causing ~1.5 million cases of gastroenteritis and 10,000 hospitalizations every year in the United States alone. We show that our combined machine learning/genomics analyses:

• Improve the accuracy with which infections can be traced back to farm reservoirs.
• Identify evolutionary shifts in bacterial affinity for livestock host species.
• Detect changes in human infection capability within related strains.

These results will improve understanding not only of Campylobacter, but more generally as these technologies can readily be applied to other important bacterial pathogen species.

This paper builds on previous work published by the group, including our well cited Tracing the source of campylobacteriosis (Wilson et al 2008, PLOS Genetics 4:e1000203). The use of these methods for tracing infection has influenced public health policy and contributed to reducing disease burden.

This work demonstrates the potential for modern genomics and artificial intelligence approaches to address common and serious problems that affect our everyday lives. The awareness of the importance of infection to society has rarely been higher than in 2021, and while the current pandemic imposes an acute global problem, other infections continue to present long-term threats to health and productivity.

This work was led by Nicolas Arning, in collaboration with David Clifton and Sam Sheppard.

### New paper: Antimicrobial resistance determinants are associated with Staphylococcus aureus bacteraemia and adaptation to the healthcare environment

Staphylococcus aureus is a leading cause of infectious disease deaths in all countries, with bloodstream infection leading to sepsis a major concern. This new study, published in November in Microbial Genomics, reports genes and genetic variants in Staph. aureus associated severe disease vs asymptomatic carriage, and healthcare vs community carriage.

Our genome-wide association study of 2000 bacterial genomes showed that antibiotic resistance in Staph. aureus is associated with severe disease and the hospital environment:

• A mutation conferring trimethoprim resistance (dfrB F99Y) and the presence of a gene conferring methicillin resistance (mecA) were both associated with bloodstream infection vs asymptomatic nose carriage.
• Separately, we demonstrated that a mutation conferring fluoroquinolone resistance (gyrA L84S) and variation in a gene involved in resistance to multiple antibiotics (prsA) were preferentially associated with healthcare-associated carriage vs community-acquired carriage.

The implication – that antibiotic resistance genes may provide survival advantages which mechanistically contribute to the development of disease – is important in the face of the continued global rise of antibiotic resistance.

We were also able to shed light on a controversy as to whether different strains of Staph. aureus differ in their propensity to cause severe disease. Interest in this question dates back decades in the literature, and contradictory studies, often based on modest sample sizes, have reached different conclusions. Our comparatively large study, using a whole-genome method that we previously published in Nature Microbiology, found that all strains of Staph. aureus are equally likely to cause severe disease vs asymptomatic carriage.

### New paper: Genome-wide association studies reveal the role of polymorphisms affecting factor H binding protein expression in host invasion by Neisseria meningitidis

In this paper, published in October in PLOS Pathogens, we discovered a novel genetic association between life-threatening invasive meningococcal disease (IMD) and bacterial genetic variation in factor H binding protein (fHbp) through two bacterial genome-wide association studies (GWAS), which we validated experimentally. This was a collaboration with the groups of Chris Tang and Martin Maiden, with the work in my group led by Sarah Earle.

fHbp is an important component of meningococcal vaccines that directly interacts with human complement factor H (CFH). Intriguingly, our discovery that bacterial genetic variation in fHbp associates with increased virulence mirrors an earlier discovery that human genetic variation in CFH associates with increased susceptibility to IMD (Nature Genetics 42: 772).

Our experiments showed that the fHbp risk allele increased expression. Interestingly, increased susceptibility to IMD has been previously associated with elevated CFH expression. Therefore over-expression of either fHbp by the bacterium or CFH by the host appears to increase the risk of IMD. Since complement evasion is necessary for pathogenesis, these insights offer new leads for improving treatment.

Key results from the paper:

• A GWAS for IMD in 261 meningococci from the Czech Republic highlighted a highly polygenic architecture of meningococcal virulence (see Figure), including capsule biosynthesis genes, the meningococcal disease association island and the new signal near the fba and fHbp genes.
• A replication GWAS for IMD in 1295 meningococcal genomes belonging to strain ST41/44 downloaded from pubMLST.org validated the novel signal of association near fba and fHbp.
• SHAPE reactivity analyses revealed that IMD-associated variation in the regulatory region of fHbp disrupted the ability of the cell machinery to commence gene expression.
• Flow cytometry assays of newly constructed genetically engineered strains, in different temperatures and in the presence and absence of human serum, attributed changes in gene expression to a non-synonymous candidate mutation in the fHbp gene.

In this study, our GWAS relied exclusively on publicly available genome sequences and metadata, highlighting the untapped potential of large-scale open source databases like pubMLST.org, and the value of big data for improving our understanding of disease.

## Tuesday, 13 April 2021

### New positions: Data Scientist in Public Health Epidemiology and Postdoc in Statistical Methods

I am looking to fill two positions at the Big Data InstituteNuffield Department of Population HealthUniversity of Oxford: a Data Scientist in Public Health Epidemiology and a Postdoctoral Researcher in Statistical Methods.

The Big Data Institute (BDI) is an interdisciplinary research centre that develops, evaluates and deploys efficient methods for acquiring and analysing biomedical data at scale and for exploiting the opportunities arising from such studies. The Nuffield Department of Population Health (NDPH), a key partner in the BDI, contains world-renowned population health research groups and is an excellent environment for multi-disciplinary teaching and research.

The role of the Data Scientist in Public Health Epidemiology is to help pilot a project developing systems for continuous record linkage between a large Public Health England (PHE) data source and other population health records, with the aim of facilitating research into infectious diseases.

The post holder will manage and develop record linkage algorithms comparing records with relational databases containing health records via appropriate anonymization protocols, and manage and develop systems for identifying incoming records of interest, for near-real time updating of SQL databases, and for issuing email and SMS alerts in response to these events. The responsibilities will also include contributing to large-scale statistical studies using public health records to investigate disease epidemiology, and analysing and interpreting results, reviewing and refining working hypotheses, writing reports and presenting findings to colleagues.

To be considered, applicants will hold a degree in Computer Science, Data Science, Statistics, or another relevant subject with a strong quantitative component, or have equivalent experience. They will also need an understanding of relational database construction and SQL queries, experience coding in at least one common programming language (e.g. C#, Java, Python) and good interpersonal skills with the ability to work closely with others as part of a team, while taking personal responsibility for assigned tasks.

The role of the Postdoctoral Researcher in Statistical Methods is to develop statistical methods based on the harmonic mean p-value (HMP) approach. The HMP bridges classical and Bayesian approaches to model-averaged hypothesis testing, with applications to very large-scale data analysis problems in biomedical science.

The post holder will join a team with expertise in statistical inference, population genetics, genomics, evolution, epidemiology and infectious disease. The responsibilities will include developing statistical methods based on the HMP, undertaking research under the direction of the principal investigator, helping with supervision within the project as required, driving forward manuscripts for publication in collaboration with group members and disseminating results through other means such as academic conferences.

To be considered, applicants will hold, or be close to completion of, a PhD/DPhil involving statistical methods development and a track record of publication-quality methods development in statistical theory or methods development. The ability to work independently in pursuing the goals of an agreed research plan and excellent interpersonal skills and the ability to work closely with others as a team are also essential.

The closing date for both positions is noon on the 5th May 2021. Only applications received through the online system will be considered:

### Presentation: Genome-wide association studies of COVID-19

An updated version of this talk given at the Nuffield Department for Population Health's annual symposium 2021:

## Monday, 7 September 2020

### Postdoc position available in Statistical Genomics

I am seeking someone with a track record in methods development for Statistical Genomics and an interest in Infectious Disease to join the group. The aim of the post is to conduct innovative research within the group's range of interests and to make use of the opportunities afforded by our outstanding collaborators. I would welcome candidates who wish to use the opportunity as a stepping stone to independent funding.

The postdoc will join a team with expertise in microbiology, genomics, evolution, population genetics and statistical inference. Responsibilities will include planning a research project and milestones with help and guidance from the group, preparing manuscripts for publication, keeping records of results and methods and tracking milestones, and disseminating results, including through academic conferences.

We will consider applicants who hold, or are close to completion of, a PhD/DPhil involving statistical methods development, and who have experience of large-scale statistical data analysis, evidence of originating and executing independent academic research ideas, excellent interpersonal skills and the ability to work closely with others in a team.

The position is advertised to 31 December 2021. The application deadline is noon on Thursday 1st October 2020. Visit the University recruitment page to apply.

## Friday, 21 August 2020

### Presentation: Genome-wide Association Studies of COVID-19

An online recording of the talk about Genome-wide Association Studies of COVID-19 at the UK Biobank 2020 meeting on 23 June 2020. The full conference is also online.

### The group's research response to COVID-19

This is an update on the group's research response to the COVID-19 pandemic. As an infectious disease group we have been keen to contribute to the international research effort where we could be useful, while recognising the need to continue our research on other important infections where possible.

• Bugbank. Thanks to a pre-existing collaboration between our group, Public Health England and UK Biobank, we were in a position to help rapidly facilitate COVID-19 research via SARS-CoV-2 PCR-based swab test results. Beginning mid-March, we worked to provide regular (usually weekly) updates of tests results, which were made available to all UK Biobank researchers beginning April 17th. This is one of several resources on COVID-19 linked to UK Biobank. Beginning in May we provided feeds to other cohorts:  We provide updates on this work through the project website www.bugbank.uk. We have published a paper describing the dynamic data linkage in Microbial Genomics (press release). Key collaborators in this project are Jacob Armstrong (Big Data Institute) Naomi Allen (UK Biobank) and David Wyllie and Anne Marie O'Connell (Public Health England).

• Epidemiological risk factors for COVID-19. Graduate student Nicolas Arning and I are developing an approach to quantify the effects of lifestyle and medical risk factors for COVID-19 in the UK Biobank that accounts for inherent uncertainty in which risk factors to consider. The new method employs the harmonic mean p-value, a model-averaging approach for big data that we published previously. We are in the process of evaluating the performance of the approach, comparing it to machine learning, and interpreting the results.

• Antibody testing for the UK Government. Postdoc Justine Rudkin has been working in the lab with Derrick Crook, Sir John Bell and others to measure the efficacy of antibody tests for the UK Government. They have tested many hundreds of kits to establish the sensitivity and specificity of the tests to help evaluate the utility of a national testing programme. This work was crucial in demonstrating the limitations of early blood-spot based tests, and the credibility of subsequent generations of antibody tests. The work has been published in Wellcome Open Research.

Work on other infections that has continued during the lockdown. Postdoc Sarah Earle continues research into pathogen genetic risk factors for diseases including tuberculosis and meningococcal meningitis, while Steven Lin has continued to pursue work on hepatitis C virus genetics and epidemiology. Many of our close collaborators are infection doctors and they have of course been recalled to clinical duties. Laboratory work in the group has been severely disrupted, particularly several of Justine's Staphylococcus aureus projects. We are keen to pick up on those projects where we left off when the chance arrives.

### Teaching: Online lectures and practical on Phylogenetics in Practice

On March 16th, we were in the interesting position of running an infectious disease course at the Big Data Institute on the day the national lockdown was announced in response to the COVID-19 pandemic. As a result, we were among the first in the university to do remote teaching, something Katrina Lythgoe and the rest of us had prepared for in anticipation of the lockdown a week earlier that never happened.

These are the two online lectures in the Health Data Sciences CDT that I gave called Phylogenetics in Practice.

The online practical, which applies phylogenetics approaches to understand the Zika virus epidemic, is implemented as a Docker container, and available here.

### Presentation on identifying COVID-19 inpatients from Public Health England data

This is a presentation I gave at the COVID-19 Host Genetics Initiative meeting on 2nd July 2020 about using Public Health England's Second Generation Surveillance System to identify COVID-19 inpatients among SARS-CoV-2 positive individuals in England.

For further information, please see this bugbank blog post comparing inpatients identified using SGSS and Hospital Episode Statistics.

## Monday, 20 July 2020

### Royal Society Summer Science Exhibition 2020

This year the Royal Society's Summer Science Exhibition was online, and included highlights and updates from previous exhibitors, among them ours from 2018. This video was posted on Tuesday's session:

## Friday, 20 March 2020

### New paper: GenomegaMap for dN/dS in over 10,000 genomes

Published this week in Molecular Biology and Evolution, is a new paper joint with the CRyPTIC Consortium "GenomegaMap: within-species genome-wide dN/dS estimation from over 10,000 genomes".

The dN/dS ratio is a popular statistic in evolutionary genetics that quantifies the relative rates of protein-altering and non-protein-altering mutations. The rate is adjusted so that under neutral evolution - i.e. when the survival and reproductive advantage of all variants is the same - it equals 1. Typically, dN/dS is observed to be less than 1 meaning that new mutations tend to be disfavoured, implying they are harmful to survival or reproduction. Occasionally, dN/dS is observed to be greater than 1 meaning that new mutations are favoured, implying they provide some survival or reproductive advantage. The aim of estimating dN/dS is usually to identify mutations that provide an advantage.

Theoreticians are often critical of dN/dS because it is more of a descriptive statistic than a process-driven model of evolution. This overlooks the problem that currently available models make simplifying assumptions such as minimal interference between adjacent mutations within genes. These assumptions are not obviously appropriate in many species, including infectious micro-organisms, that exchange genetic material infrequently.

There are many methods for measuring dN/dS. This new paper overcomes two common problems:
• It is fast no matter how many genomes are analysed together.
• It is robust whether there is frequent genetic exchange (which causes phylogenetic methods to report spurious signals of advantageous mutation) or infrequent genetic exchange.
The paper includes detailed simulations that establish the validity of the approach, and it goes on to demonstrate how genomegaMap can detect advantageous mutations in 10,209 genomes of Mycobacterium tuberculosis, the bacterium that causes tuberculosis. The method reproduces known signals of advantageous mutations that make the bacteria resistant to antibiotics, and it discovers a new signal of advantageous mutations in a cold-shock protein called deaD or csdA.

Software that implements genomegaMap is available on Docker Hub and the source code and documentation are available on Git Hub.

With the steady rise of more and more genome sequences, the analysis of data becomes an increasing challenge even with modern computers, so it is hoped that this new method provides a useful way to exploit the opportunities in such large datasets to gain new insights into evolution.

## Monday, 16 March 2020

### Postdoc Available in Statistical Genetics

The closing date for applications for this post is noon on Wednesday 15th April 2020.

We are seeking an exceptional researcher with a track record in methods development for Statistical Genomics and an interest in Infectious Disease to join our group at the Big Data Institute. Our research focuses on Bacterial Genomics, Genome-Wide Association Studies and Population Genetics. The aim of the post is to conduct innovative research within the group's range of interests and to make use of the opportunities afforded by our outstanding collaborators. We welcome candidates who wish to use the opportunity as a stepping stone to independent funding.

The Oxford University Big Data Institute (BDI) is an interdisciplinary research centre aiming to develop, evaluate and deploy efficient methods for acquiring and analysing biomedical data at scale and for exploiting the opportunities arising from such studies. The Nuffield Department of Population Health, a partner in the BDI, contains world-renowned population health research groups and is an excellent environment for multi-disciplinary teaching and research.

The Postdoctoral Researcher in Statistical Genomics will join our team which has expertise in microbiology, genomics, evolution, population genetics and statistical inference. Responsibilities include planning a research project and milestones with help and guidance from the group, preparing manuscripts for publication, keeping records of results and methods and tracking milestones, and disseminating results.

To be considered, you need to hold, or be close to completion of, a PhD/DPhil involving statistical methods development. You also need experience of large-scale statistical data analysis, evidence of originating and executing your own academic research ideas and excellent interpersonal skills and the ability to work closely with others in a team.

Further details, including how to apply are here: https://my.corehr.com/pls/uoxrecruit/erq_jobspec_details_form.jobspec?p_id=145506

## Tuesday, 22 October 2019

### Correction published and R package on GitHub

The correction to "The harmonic mean p-value for combining dependent tests" has been published www.pnas.org/content/early/2019/10/02/1914128116 and the main article has been corrected online www.pnas.org/content/116/4/1195.

I have posted the source code for the harmonicmeanp R package on GitHub. This means there is now a development version with the latest updates, and instructions for installing it: github.com/danny-wilson/harmonicmeanp/tree/dev.

## Monday, 19 August 2019

### Updated correction: The harmonic mean p-value for combining dependent tests

Important: Please update to version 3.0 of the harmonicmeanp R package to address critical errors in the main function p.hmp.

I would like to update the correction I issued on July 3, 2019 to cover a second error I discovered that affects the main function of the R package,  p.hmp. There are two errors in the original paper:
1. The paper (Wilson 2019 PNAS 116: 1195-1200) erroneously stated that the test $$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{|\mathcal{R}|}\,w_\mathcal{R}$$ controls the strong-sense family-wise error rate asymptotically when it should read $$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$.
2. The paper incorrectly stated that one can produce adjusted p-values that are asymptotically exact, as intended in the original Figure 1, by transforming the harmonic mean p-value with Equation 4 before adjusting by a factor $$1/w_{\mathcal{R}}$$. In fact the harmonic mean p-value must be multiplied by $$1/w_{\mathcal{R}}$$ before transforming with Equation 4.
The p.hmp function prior to version 3.0 of the R package was affected by these errors.

In the above,
• L is the total number of individual p-values.
• $$\mathcal{R}$$ represents any subset of those p-values.
• $$\overset{\circ}{p}_\mathcal{R} = \left(\sum_{i\in\mathcal{R}} w_i\right)/\left(\sum_{i\in\mathcal{R}} w_i/p_i\right)$$ is the HMP for subset $$\mathcal{R}$$.
• $$w_i$$ is the weight for the ith p-value. The weights must sum to one: $$\sum_{i=1}^L w_i=1$$. For equal weights, $$w_i=1/L$$.
• $$w_\mathcal{R}=\sum_{i\in\mathcal{R}}w_i$$ is the sum of weights for subset $$\mathcal{R}$$.
• $$|\mathcal{R}|$$ gives the number of p-values in subset $$\mathcal{R}$$.
• $$\alpha_{|\mathcal{R}|}$$ and $$\alpha_{L}$$ are significance thresholds provided by the Landau distribution (Table 1).
Version 2.0 (released July 2019) of the harmonicmeanp R package only addressed the first error, which is why I am now releasing version 3.0 to address both errors. Compared to version 1.0, the main function p.hmp has been updated to take an additional argument, L, which sets the total number of p-values. If argument L is omitted, a warning is issued and L is assumed to equal the length of the first argument, p, preserving earlier behaviour. Please update the R package to version 3.0.

The tutorial, available as a vignette in the R package and online, is affected quantitatively by both errors, and has been extensively updated for version 3.0.

The second error affects only one line of the corrected paper (issued July 2019). I have updated it to address the second error and two typos in Figure legends 1 and 2: http://www.danielwilson.me.uk/files/wilson_2019_annotated_corrections.v2.pdf. You will need Adobe Reader to properly view the annotations and the embedded corrections to Figures 1 and 2.

I would like to deeply apologise to users for the inconvenience the two errors have caused.

### Why does this matter?

The family-wise error rate (FWER) controls the probability of falsely rejecting any null hypotheses, or groups of null hypotheses, when they are true. The strong-sense FWER maintains control even when some null hypotheses are false, thereby offering control across much broader and more relevant scenarios than the weak-sense FWER.

The ssFWER is not controlled at the expected rate if:

1. The more lenient threshold $$\alpha_{|\mathcal{R}|}$$ is used rather than the corrected threshold $$\alpha_L$$, both derived via Table 1 of the paper from the desired ssFWER $$\alpha$$.
2. Raw p-values are transformed with Equation 4 before adjusting by a factor $$w_{\mathcal{R}}^{-1}$$, rather than adjusting the raw p-values by a factor $$w_{\mathcal{R}}^{-1}$$ before transforming with Equation 4.
The p.hmp function of the R package suffers both issues in version 1.0, and the second issue in version 2.0. Please update to version 3.0.

Tests in which significance is marginal or non-significant at $$\alpha=0.05$$ are far more likely to be affected in practice.

Regarding error 1, individual p-values need to be assessed against the threshold $$\alpha_{L}/L$$ when the HMP is used, not the more lenient $$\alpha_{1}/L$$ nor the still more lenient $$\alpha/L$$ (assuming equal weights). This shows that there is a cost to using the HMP compared to Bonferroni correction in the evaluation of individual p-values (and indeed small groups of p-values). For one billion tests $$\left(L=10^9\right)$$ and a desired ssFWER of $$\alpha=0.01$$, the fold difference in thresholds from Table 1 would be $$\alpha/\alpha_L=0.01/0.008=1.25$$.

However, it remains the case that HMP is more powerful than Bonferroni for assessing the significance of large groups of hypotheses. This is the motivation for using the HMP, and combined tests in general, because the power to find significant groups of hypotheses will be higher than the power to detect significant individual hypotheses when the total number of tests (L) is large and the aim is to control the ssFWER.

### How does it affect the paper?

I have submitted a request to correct the paper to PNAS. It is up to the editors whether to agree to this request. A copy of the published paper, annotated with the requested corrections, is available here: http://www.danielwilson.me.uk/files/wilson_2019_annotated_corrections.v2.pdf. Please use Adobe Reader to properly view the annotations and the embedded corrections to Figures 1 and 2.

### Where did the errors come from?

Regarding the first error, page 11 of the supplementary information gave a correct version of the full closed testing procedure that controls the ssFWER (Equation 37). However, it went on to erroneously claim that "one can apply weighted Bonferroni correction to make a simple adjustment to Equation 6 by substituting $$\alpha_{|\mathcal{R}|}$$ for $$\alpha$$." This reasoning would only be valid if the subsets of p-values to be combined were pre-selected and did not overlap. However, this would no longer constitute a flexible multilevel test in which every combination of p-values can be tested while controlling the ssFWER. The examples in Figures 1 and 2 pursued multilevel testing, in which the same p-values were assessed multiple times in subsets of different sizes, and in partially overlapping subsets of equal sizes. For the multilevel test, a formal shortcut to Equation 37, which makes it computationally practicable to control the ssFWER, is required. The simplest such shortcut procedure is the corrected test
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$
One can show this is a valid multilevel test because if $$\overset{\circ}{p}_\mathcal{R}\leq\alpha_L\,w_\mathcal{R}$$ then $$\overset{\circ}{p}=\left(w_\mathcal{R}\,\overset{\circ}{p}^{-1}_\mathcal{R}+w_{\mathcal{R}^\prime}\,\overset{\circ}{p}^{-1}_{\mathcal{R}^\prime}\right)^{-1} \leq w^{-1}_\mathcal{R}\,\overset{\circ}{p}_\mathcal{R}\leq\alpha_L$$an argument that mirrors the logic of Equation 7 for direct interpretation of the HMP (an approximate procedure), which is not affected by this correction.

The second error, which was also caused by carelessness on my part, occurred in the main text in the statement "(Equivalently, one can compare the exact p-value from Eq. 4 with $$\alpha\,w_{\mathcal{R}}$$.)" I did not identify it sooner because the corrected version of the paper no longer uses Equation 4 to transform p-values in Figure 1.

### How do I update the R package?

The R package is maintained at https://cran.r-project.org/package=harmonicmeanp.

In R, test whether you have version 3.0 of the package installed as follows:
packageVersion("harmonicmeanp")

The online binaries take a few days or weeks to update, so to ensure you install the most recent version of the package install from source by typing:
install.packages("harmonicmeanp", dependencies=TRUE, type="source")

You can additionally specify the CRAN repository for example:
install.packages("harmonicmeanp", dependencies=TRUE, type="source", repos="https://cran.wu.ac.at")

After installation, check again the version number:
stopifnot(packageVersion("harmonicmeanp")>=3.0)

### What if I have already reported results?

I am very sorry for inconvenience caused in this case.

As long as the 'headline' test was significant with p.hmp under R package versions 1.0 or 2.0, then the weak-sense FWER can be considered to have been controlled. The 'headline' test is the test in which all p-values are included in a single combined test. The headline test is not affected by either error, because $$|\mathcal{R}|=L$$ and $$w_{\mathcal{R}}=1$$. The headline test controls the weak-sense FWER, and therefore so does a two-step procedure in which subsets are only deemed significant when the headline test is significant (Hochberg and Tamhane, 1987, Multiple Comparison Procedures, p. 3, Wiley).

If the headline test was not significant, re-running the analysis with version 3.0 will not produce significant results either because the stringency is greater for controlling the strong-sense FWER. If the headline test was significant, you may wish to reanalyse the data with version 3.0 to obtain strong-sense FWER control, because this was the criterion the HMP procedure was intended to control.

If some results that were significant under version 1.0 or 2.0 of the R package are no longer significant, you may conclude they are not significant or you may report them as significant subject to making clear that only the weak-sense FWER was controlled.

## Saturday, 6 July 2019

### Correction: The harmonic mean p-value for combining dependent tests

Important: this announcement has been superceded. Please see updated correction

I would like to issue the following correction to users of the harmonic mean p-value (HMP), with apologies: The paper (Wilson 2019 PNAS 116: 1195-1200) erroneously states that the following asymptotically exact test controls the strong-sense family-wise error rate for any subset of p-values $$\mathcal{R}$$:
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{|\mathcal{R}|}\,w_\mathcal{R}$$
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$
where:
• L is the total number of individual p-values.
• $$\mathcal{R}$$ represents any subset of those p-values.
• $$\overset{\circ}{p}_\mathcal{R} = \left(\sum_{i\in\mathcal{R}} w_i\right)/\left(\sum_{i\in\mathcal{R}} w_i/p_i\right)$$ is the HMP for subset $$\mathcal{R}$$.
• $$w_i$$ is the weight for the ith p-value. The weights must sum to one: $$\sum_{i=1}^L w_i=1$$. For equal weights, $$w_i=1/L$$.
• $$w_\mathcal{R}=\sum_{i\in\mathcal{R}}w_i$$ is the sum of weights for subset $$\mathcal{R}$$.
• $$|\mathcal{R}|$$ gives the number of p-values in subset $$\mathcal{R}$$.
• $$\alpha_{|\mathcal{R}|}$$ and $$\alpha_{L}$$ are significance thresholds provided by the Landau distribution (Table 1).
In version 2.0 of the harmonicmeanp R package, the main function p.hmp is updated to take an additional argument, L, which sets the total number of p-values. If argument L is omitted, a warning is issued and L is assumed to equal the length of the first argument, p, preserving previous behaviour. Please update the R package.

An updated tutorial is available as a vignette in the R package and online here: http://www.danielwilson.me.uk/harmonicmeanp/hmpTutorial.html

### Why does this matter?

The family-wise error rate (FWER) controls the probability of falsely rejecting any null hypotheses, or groups of null hypotheses, when they are true. The strong-sense FWER maintains control even when some null hypotheses are false, thereby offering control across much broader and more relevant scenarios.

Using the more lenient threshold $$\alpha_{|\mathcal{R}|}$$ rather than the corrected threshold $$\alpha_L$$, both derived via Table 1 of the paper from the desired ssFWER $$\alpha$$, means the ssFWER is not controlled at the expected rate.

Tests with small numbers of p-values are far more likely to be affected in practice. In particular, individual p-values should be assessed against the threshold $$\alpha_{L}/L$$ when the HMP is used, not the more lenient $$\alpha_{1}/L$$ nor the still more lenient $$\alpha/L$$ (assuming equal weights). This shows that there is a cost to using the HMP compared to Bonferroni correction in the evaluation of individual p-values. For one billion tests $$\left(L=10^9\right)$$ and a desired ssFWER of $$\alpha=0.01$$, the fold difference in thresholds from Table 1 would be $$\alpha/\alpha_L=0.01/0.008=1.25$$.

However, it remains the case that HMP is much more powerful than Bonferroni for assessing the significance of groups of hypotheses. This is the motivation for using the HMP, and combined tests in general, because the power to find significant groups of hypotheses will be much higher than the power to detect significant individual hypotheses when the total number of tests (L) is large and the aim is to control the ssFWER.

### How does it affect the paper?

I have submitted a request to correct the paper to PNAS. It is up to the editors whether to agree to this request. A copy of the published paper, annotated with the requested corrections, is available here: http://www.danielwilson.me.uk/files/wilson_2019_annotated_corrections.pdf. Please use Adobe Reader to properly view the annotations and the embedded corrections to Figures 1 and 2.

### Where did the error come from?

Page 11 of the supplementary information gave a correct version of the full closed testing procedure that controls the ssFWER (Equation 37). However, it went on to erroneously claim that "one can apply weighted Bonferroni correction to make a simple adjustment to Equation 6 by substituting $$\alpha_{|\mathcal{R}|}$$ for $$\alpha$$." This reasoning would only be valid if the subsets of p-values to be combined were pre-selected and did not overlap. However, this would no longer constitute a flexible multilevel test in which every combination of p-values can be tested while controlling the ssFWER. The examples in Figures 1 and 2 pursued multilevel testing, in which the same p-values were assessed multiple times in subsets of different sizes, and in partially overlapping subsets of equal sizes. For the multilevel test, a formal shortcut to Equation 37, which makes it computationally practicable to control the ssFWER, is required. The simplest such shortcut procedure is the corrected test
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$
One can show this is a valid multilevel test because if $$\overset{\circ}{p}_\mathcal{R}\leq\alpha_L\,w_\mathcal{R}$$ then $$\overset{\circ}{p}=\left(w_\mathcal{R}\,\overset{\circ}{p}^{-1}_\mathcal{R}+w_{\mathcal{R}^\prime}\,\overset{\circ}{p}^{-1}_{\mathcal{R}^\prime}\right)^{-1} \leq w^{-1}_\mathcal{R}\,\overset{\circ}{p}_\mathcal{R}\leq\alpha_L$$an argument that mirrors the logic of Equation 7 for direct interpretation of the HMP, which is not affected by this correction.

## Monday, 25 February 2019

### New paper: PVL toxin associated with pyomyositis

In a new collaborative study published this week in eLife, we report a strong association between Staphylococcus aureus that carry the PVL toxin and pyomyositis, a muscle infection often afflicting children in the tropics.

Catrin Moore and colleagues at the Angkor Children's Hospital in Siem Reap, Cambodia, spent more than a decade collecting S. aureus bacteria from pyomyositis infections in young children, and built a comparable control group of S. aureus carried asymptomatically in children of similar age and location.

When Bernadette Young in our group compared the genomes of cases and controls using statistical tools we developed, she found some strong signals:

• Most, but not all, pyomyositis was caused by the CC-121 strain, common in Cambodia.
• The association with CC-121 was driven by the PVL toxin which it carries.
The ability to pinpoint the association to PVL came about because (i) a sub-group of CC-121 that lacked PVL caused no pyomyositis and (ii) pyomyositis-causing S. aureus from backgrounds that rarely caused pyomyositis were unusual in also possessing PVL.

The strength of the PVL-pyomyositis association was extraordinarily strong, so strong that PVL appeared all-but necessary for disease. Moreover, disease appeared to be monogenic, with no other genes involved elsewhere in the bacterial genome. To discover an apparently monogenic disease mechanism for a common disease is very unusual nowadays.

The discovery has immediate practical implications because it draws parallels between pyomyositis and toxin-driven bacterial diseases like tetanus and diphtheria that have proven amenable to immunization. The fact that anti-PVL vaccines have already been developed in other contexts offers hope for the future treatment of this debilitating tropical infection.

Our study throws much-needed light on a subject that has been the subject of heated debate over previous years. Many bacterial toxins, PVL included, have been implicated in diverse S. aureus disease manifestations, often without sound evidence. Because PVL is known to contribute to angry, pus-filled skin infections, and has been observed in bacteria causing rare and severe S. aureus infections, some authors have implicated it in dangerous diseases including necrotizing pneumonia, septic arthritis and pyomyositis, but detailed meta-analyses have dismissed these claims as not substantiated. Our GWAS approach offers unprecedented robustness over previous generations of candidate gene studies by accounting for bacterial genetic variation across the entire genome.

If you are interested, please take a closer look at the paper.

## Monday, 7 January 2019

### New paper in PNAS: harmonic mean p-value

Published on Friday in Proceedings of the National Academy of Sciences USA, "The harmonic mean p-value for combining dependent tests" reports a new method for performing combined tests. A revised R package with detailed examples is now available online as the harmonicmeanp package on CRAN.

The method has two stages:
• Compute a test statistic: the harmonic mean of the p-values (HMP) of the tests to be combined. Remarkably, this HMP is itself a valid p-value for small values (e.g. below 0.05).
• Calculate an asymptotically exact p-value from the test statistic using generalized central limit theorem. The distribution is a type of Stable distribution first described by Lev Landau.
The method, which controls the strong-sense family-wise error rate (ssFWER), has several advantages over existing alternatives to combining p-values:
• Combining p-values allows information to be aggregated over multiple tests and requires less stringent significance thresholds.
• The HMP procedure is robust to positive dependence between the p-values, making it more widely applicable than Fisher's method which assumes independence.
• The HMP procedure is more powerful than the Bonferroni and Simes procedures.
• The HMP procedure is more powerful than the Benjamini-Hochberg (BH) procedure, even though BH only controls the weaker false discovery rate (FDR) and weak-sense family-wise error rate (wsFWER) in the sense that whenever the BH procedure detects one or more significant p-values, the HMP procedure will detect one or more significant p-values or groups of significant p-values.
The ssFWER can be considered gold-standard control of false positives because it aims to control the probability of one or more false positives even in the presence true positives. The HMP is inspired by Bayesian model averaging and approximates a model-averaged Bayes factor under certain conditions.

In researching and revising the paper, I looked high and low for previous uses of the harmonic-mean p-value because most ideas have usually been had already. Although there is a class of methods that use different types of average p-value (without compelling motivation), I did not find a precedent. Until today, a few days too late, so I may as well get in there and declare it before anyone else. I. J. Good published a paper in 1958 that mysteriously appeared when I googled the new publication on what he called the "harmonic mean rule-of-thumb", effectively for model-averaging. Undeniably, I did not do my homework thoroughly enough. Still, I would be interested if others know more about the history of this rule-of-thumb.

Good's paper, available on Jstor, proposes that the HMP "should be regarded as an approximate tail-area probability" [i.e. p-value], although he did not propose the asymptotically exact test (Eq. 4) or the multilevel test procedure (Eq. 6) that are important to my approach. His presentation is amusingly apologetic, e.g. "an approximate rule of thumb is tentatively proposed in the hope of provoking discussion", "this rule of thumb should not be used if the statistician can think of anything better to do" and "The 'harmonic-mean rule of thumb' is presented with some misgivings, because, like many other statistical techniques, it is liable to be used thoughtlessly". Perhaps this is why the method (as far as I could tell) had disappeared from the literature. Hopefully the aspects new to my paper will shake off these misgivings and provide users with confidence that the procedure is interpretable and well-motivated on theoretical as well as empirical grounds. Please give it a read!

Work cited
• R. A. Fisher (1934) Statistical Methods for Research Workers (Oliver and Boyd, Edinburgh), 5th Ed.
• L. D. Landau (1944) On the energy loss of fast particles by ionization. Journal of Physics U.S.S.R. 8: 201-205.
• I. J. Good (1958) Significance tests in parallel and in series. Journal of the American Statistical Association 53: 799-813. (Jstor)
• R. J. Simes (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73: 751-754.
• Y. Benjamini and Y. Hochberg (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57: 289-300.
• D. J. Wilson (2019) The harmonic mean p-value for combining dependent tests. Proceedings of the National Academy of Sciences U.S.A. published ahead of print January 4, 2019. (PNAS)

## Wednesday, 18 July 2018

### Bacterial Doubling Times in the Wild

How fast do bacteria grow outside the laboratory? This simple question is very difficult to address directly, because it is near-impossible to track a lineage of bacterial cells, ancestor-to-decendant, inside an infected patient or through a river. Now in new work published in Proceedings B, Beth Gibson, Ed Feil, Adam Eyre-Walker and I exploit genome sequencing to try to get a handle on the problem indirectly.

We have done it by comparing two known quantities and taking the ratio: the rate at which DNA mutates in bacteria per year, and the rate it mutates per replication. This tells us in theory how many replications there are per year.

The mutation rate per replication has long been studied in the laboratory, and is around once per billion letters. Meanwhile, the recent avalanche of genomic data has allowed microbiologists to quantify the rate at which bacteria evolve over short time scales such as a year, including during outbreaks and even within individual infected patients. Most bugs mutate about once per million letters per year, with ten-fold variation above and below this not uncommon among different species.

For five species both these quantities exist. The fastest bug we looked at causes cholera and we estimate it doubles once every hour on average (give or take 30 minutes). The slowest was Salmonella, which we estimate doubles once a day on average (give or take 8 hours). In between were Staph. aureus and Pseudomonas at about two hours each, and E. coli at 15 hours. These are average over the very diverse and often hostile conditions that a bacterial cell may find itself in during the course of its natural lifecycle. To find out more about the work, please check out the paper.

## Friday, 29 June 2018

### PhD Studentship: Genomic prediction of antimicrobial resistance spread

This position is now closed
An opportunity has arisen for a D.Phil. (Ph.D.) place on the BBSRC-funded Oxford Interdisciplinary Bioscience Doctoral Training Partnership in the area of Artificial Intelligence, specifically Predicting the spread of antimicrobial resistance from genomics using machine learning.

If successful in a competitive application process, the candidate will join a cohort of students enrolled in the DTP’s one-year interdisciplinary training programme, before commencing the research project and joining my research group at the Big Data Institute.

This project addresses the BBSRC priority area “Combatting antimicrobial resistance” by using ML to predict the spread of antimicrobial resistance in human, animal and environmental bacteria exemplified by Escherichia coli. Understanding how quickly antimicrobial resistance (AMR) will spread helps plan effective prevention, improved biosecurity, and strategic investment into new measures. We will develop ML tools for large genomic datasets to predict the future spread of AMR in humans, animals and the environment. The project will create new methods based on award-winning probabilistic ML tools pioneered in my group (BASTA, SCOTTI) by training models using genomic and epidemiological data informative about past spread of AMR. We will apply the tools collaboratively to genomic studies of E. coli in Kenya, the UK and across Europe from humans, animals and the environment, Enterobacteriaceae in North-West England, and Campylobacter in Wales. Genomics has proven effective for asking “what went wrong” in the context of outbreak investigation and AMR spread; here we will address the greater challenge of repurposing such information using ML for forward prediction of future spread of AMR. Scrutiny will be intense because future predictions can and will be tested, raising the bar for the biological realism required while producing computationally efficient tools.

Attributes of suitable applicants: Understanding of genomics. Interest in infectious disease. Some numeracy, e.g. mathematics A-level, desirable. Experience of coding would help.

Funding notes: BBSRC eligibility criteria for studentship funding applies (https://www.ukri.org/files/legacy/news/training-grants-january-2018-pdf/). Successful students will receive a stipend of no less than the standard RCUK stipend rate, currently set at £14,777 per year.

How to apply: send me a CV and brief covering letter/email (no more than 1 page) explaining why you are interested and suitable by the Wednesday 11 July initial deadline. I will invite the best applicant/s to submit with me a formal application in time for the Friday 13 July second-stage deadline.

## Wednesday, 27 June 2018

### Royal Society Summer Science Exhibition Stall July 2-8

Next week researchers from the Modernising Medical Microbiology consortium, collaborating groups and I will exhibit the Resistance is Futile stall at the Royal Society Summer Science Exhibition. The exhibition is a free event in central London open to all visitors. Our stall is an opportunity to tell visitors about our research, and how advances in genetics are influencing day-to-day life. On show at the Resistance is Futile stall:

Oxford Nanopore Technology Demos
DNA sequencing in the NHS is shortening the time to diagnose antibiotic resistance in serious infections

Evolution Dance Mat
Resistance mutants arise spontaneously through chance copying errors during DNA replication

Antibiotic Resistance Coconut Shy
Antibiotic use gives resistance mutants a strong advantage so they rapidly increase in frequency.

During the exhibition we will be tweeting from @ResistanceIF

Our stall is generously supported by Oxford Nanopore Technology, the Nuffield Department of Medicine, and through public engagement research funding awarded to our research groups by the Wellcome Trust, the Royal Society, the National Institute for Health Research, the Oxford Biomedical Research Centre, the Natural Environment Research Council, the Medical Research Council, the Newton Fund and the Bill & Melinda Gates Foundation.

## Friday, 18 May 2018

### Postdoc positions in Data Science and Molecular Microbiology

These positions are now closed
As part of the move to the Big Data Institute, two new postdoctoral positions funded by the Robertson Foundation are available in Data Science and Molecular Microbiology.

The BDI is a new interdisciplinary research centre aiming to develop, evaluate and deploy efficient methods for acquiring and analysing biomedical data at scale and for exploiting the opportunities arising from such studies. The BDI is a joint venture between the renowned Nuffield Department of Population Health (NDPH) and NDM.

The Data Scientist role, split between the BDI and London, will be part of a team developing systems for continuous record linkage between Public Health England and other population health records. The aims are to design record linkage algorithms, manage front ends for viewing the data source, and analyse and interpret results. We're looking for a graduate or equivalent experience in computer science, data science, statistics, or any other relevant subject with a strong quantitative component. Knowledge of databases like SQL and computer programming are needed.

The Molecular Microbiology role, based mainly at the John Radcliffe Hospital Microbiology Department, will be part of a team researching Staphylococcus aureus infection using RNA sequencing, genome wide association studies, and biochemical and immunological assays of bacterial behaviour. The aims include designing microbiological protocols, researching bacterial molecular genetics and data analysis. We're looking for a PhD or equivalent experience in a relevant subject such as microbiology, immunology, genetics or biochemistry. Experience designing protocols and basic microbiological and immunological skills are required.

The deadline for the posts is Noon on 6 June 2018. Both are one year positions. For more details or to apply click here for the Data Scientist role and here for the Molecular Microbiologist role.

### The group has moved to the Big Data Institute, University of Oxford

From April we have moved to the Big Data Institute, Nuffield Department of Population Health at the University of Oxford. The group is maintaining its close links to the Modernising Medical Microbiology Consortium and the John Radcliffe Hospital, Oxford. I am grateful to the Robertson Foundation for funding. We're excited about joining new colleagues and benefiting from their expertise in epidemiology, health informatics, genetics and infection, while continuing to cultivate strong links with our existing collaborators in Oxford and around the world.

## Sunday, 31 December 2017

### New paper: Severe infections emerge from commensal bacteria by adaptive evolution

Published this month in eLife, our new paper on the evolution and adaptation of Staphylococcus aureus during infection.

This study shows that the emergence of life-threatening infections of the major pathogen Staphylococcus aureus from bacteria colonizing the nose is associated with repeatable adaptive evolution inside the human body.

First author Bernadette Young has summarized the paper's findings on the Modernising Medical Microbiology blog.

## Monday, 18 December 2017

### SCOTTI wins PLoS Computational Biology Research Prize

Work from our group has been recognised in the PLoS Computational Biology 2017 Research Prizes. SCOTTI, which infers transmission routes from genetic and epidemiological information, won the Breakthrough in Advance/Innovation category. The citation reads
Our Breakthrough Advance/Innovation winning article presents a new computational tool, called SCOTTI (Structured COalescent Transmission Tree Inference), developed by Nicola De Maio of the University of Oxford (UK), and colleagues. De Maio says, “SCOTTI represents a convenient tool to reconstruct who-infected-whom within outbreaks… [and] has been used in particular for the study of bacterial hospital outbreaks”. It combines epidemiological information about patient exposure with genetic information about the infectious agent itself.
Work is nominated and selected as described in the announcement:
The journal invited the community to nominate their favorite 2016 published Research Articles. From these nominations the PLOS Computational Biology Research Prize Committee, made up of Editorial Board members Dina Schneidman, Nicola Segata, Maricel Kann, Isidore Rigoutsos, Avner Schlessinger, Lilia Iakoucheva, Ilya Ioshikhes, Shi-Jie Chen, and Becca Asquith, selected the winners. To help support future work, the authors of each winning paper will receive award certificates and a \$2,000 (USD) prize.
You can read more about SCOTTI and the accompanying paper, written by Nicola De Maio, Jessie Wu and me, here.

## Monday, 11 September 2017

### Promiscuous bacteria have staying power

An insight article with Ruth Massey on John Lees' and Stephen Bentley's new paper was published in eLife on Friday:

Streptococcus pneumoniae is a notorious bacterial pathogen hiding in plain sight. A common resident of the nose and throat, between 68% and 84% of young infants will carry this species at any given time (Turner et al., 2012). In most cases it causes no harm, yet the presence of pneumococci – as the bacteria are known – can predispose a person to life-threatening infections like pneumonia or meningitis. Indeed, pneumococci are responsible for around 10% of all deaths in young children around the world (O'Brien et al., 2009), with the vast majority of cases being in developing countries.
Research into S. pneumoniae is complicated because the species is a patchwork of distinctive strains and some of these strains remain in the nose and throat for longer than others. Now, in eLife, John Lees and Stephen Bentley – both at the Wellcome Trust Sanger Institute – and colleagues report that strains rendered impotent by a virus do not linger for as long as other strains (Lees et al., 2017).

## Thursday, 3 August 2017

### New draft paper on combining p-values through the harmonic mean

In a preprint released today on Biorxiv I report a new method for improving the sensitivity to detect statistical signals by averaging over multiple alternative hypotheses using the harmonic mean p-value. The draft paper looks at example problems in genome-wide association studies (GWAS) in which signals of association may be apparent, but perhaps not sufficiently strong to meet the stringent threshold required to control for the millions of tests performed. Combining weak signals in arbitrary ways - for example across consecutive variants - can reveal signals sufficiently strong to meet the statistical significance threshold. This could be especially useful when looking for interactions, for example between host and pathogen genetics in their effect on infection, because it may be possible to conclude that a particular variant on the host side is involved, even if there is uncertainty over the specific pathogen variant it interacts with. Often such uncertainty arises because of the sheer number of possibilities. Similar ideas are beginning to gain traction in GWAS, and the ability to easily average over hypotheses is one of the strengths of Bayesian statistics. This new paper shows that the benefits of model averaging can be achieved easily in non-Bayesian statistics by taking the harmonic mean p-value from a range of tests. The test is very general and robust to a range of complexities including non-independence between the p-values.

## Thursday, 29 September 2016

### New paper: SCOTTI Efficient reconstruction of transmission within outbreaks with the structured coalescent

New paper published today in PLoS Computational Biology: Understanding how infectious disease spreads and where it originates is essential for devising policies to prevent and limit outbreaks. Whole genome sequencing of pathogens has proved an extremely promising tool for identifying transmission, particularly when combined with classical epidemiological data. Several statistical and computational approaches are available for exploiting genomics for epidemiological investigation. These methods have seen applications to dozens of outbreak studies. However, they have a number of serious drawbacks.

In this new paper Nicola De Maio, Jessie Wu and I introduce SCOTTI, a method for quickly and accurately inferring who-infected- whom from genomic and epidemiological data. SCOTTI addresses very widespread, but generally neglected problems in joint epidemiological and genomic inference, notably the presence of non-sampled and undetected intermediate cases and within-host pathogen variation caused by microevolution. Using real examples and simulations, we show that these problems cause strong misleading effects on existing popular inference methods. SCOTTI is based on BASTA, our recent breakthrough method for phylogeographic inference, and offers new standards of accuracy, calibration, and computational efficiency. SCOTTI is distributed as an open source package within BEAST2.

## Friday, 23 September 2016

### Prize PhD Studentships available

I am offering two PhD projects as part of the annual Nuffield Department of Medicine Prize Studentship competition:
These are fully-funded, four-year awards open to outstanding students of any nationality. Applicants nominate three projects, in order of preference, from the available pool. For how to apply, click here. Only applications submitted through the online system will be considered, but interested applicants are welcome to contact me informally. The deadline for applications is noon, 6th January 2017.

In addition to my projects, the Modernising Medical Microbiology project has announced the following PhD projects as part of the competition: