Thursday 3 August 2017

New draft paper on combining p-values through the harmonic mean

In a preprint released today on Biorxiv I report a new method for improving the sensitivity to detect statistical signals by averaging over multiple alternative hypotheses using the harmonic mean p-value. The draft paper looks at example problems in genome-wide association studies (GWAS) in which signals of association may be apparent, but perhaps not sufficiently strong to meet the stringent threshold required to control for the millions of tests performed. Combining weak signals in arbitrary ways - for example across consecutive variants - can reveal signals sufficiently strong to meet the statistical significance threshold. This could be especially useful when looking for interactions, for example between host and pathogen genetics in their effect on infection, because it may be possible to conclude that a particular variant on the host side is involved, even if there is uncertainty over the specific pathogen variant it interacts with. Often such uncertainty arises because of the sheer number of possibilities. Similar ideas are beginning to gain traction in GWAS, and the ability to easily average over hypotheses is one of the strengths of Bayesian statistics. This new paper shows that the benefits of model averaging can be achieved easily in non-Bayesian statistics by taking the harmonic mean p-value from a range of tests. The test is very general and robust to a range of complexities including non-independence between the p-values.