Wednesday 3 January 2024

Introducing Doublethink: joint Bayesian-frequentist model-averaged hypothesis testing

This week Nick Arning, Helen Fryer and I released two related preprints describing a new method called Doublethink, and its application to identifying risk factors for COVID-19 hospitalization in UK Biobank:

Doublethink: Bayesian-frequentist model-averaged hypothesis testing

Doublethink enables joint Bayesian and frequentist hypothesis testing when there is model uncertainty by interconverting Bayesian posterior odds and classical (frequentist) p-values. It has broad implications because (i) it reveals connections between the Bayesian approach to model averaging and the classical approach to multiple testing, and (ii) it brings the benefits of Bayesian model averaging to classical statistics.
Doublethink addresses two fundamental problems in hypothesis testing:
  1. In classical tests, the statistical evidence that one variable directly affects an outcome generally depends on which other variables are assumed to directly affect it.
  2. In Bayesian tests, the statistical evidence that one variable directly affects an outcome depends on the prior assumptions.
These issues are addressed by computing p-values from Bayesian model-averaged posterior odds, which (1) account for model uncertainty and (2) are theoretically invariant to prior assumptions, assuming large sample sizes.
Doublethink simultaneously controls the frequentist family-wise error rate (FWER) and the Bayesian false discovery rate (FDR). It builds on Johnson's Bayesian tests based on likelihood ratio statistics, and Karamata's theory of regular variation.

Identifying direct risk factors in UK Biobank with Doublethink

We applied Doublethink to identify direct risk factors for COVID-19 hospitalization in UK Biobank. This is a well-studied problem but we took an 'exposome-wide' approach in which we evaluated whether 1,900 variables measured in the UK Biobank each affected the outcome. This is still an under-utilized approach in epidemiology, which usually focuses on candidate risk factors.
Exposome-wide approaches have potential benefits over candidate risk factor approaches, including:
  • The ability to discover unexpected results.
  • Stringent control for multiple testing.
  • Avoidance of bias in choosing candidate risk factors or deciding to publish.
However, we only studied the direct effects of variables on the outcome. This means we cannot make statements about the total (direct and indirect) effects of a variable, e.g. smoking, on the outcome, which are needed in applications like assessing potential interventions.
We identified individual variables and groups of variables that were 'exposome-wide significant' at 9% FDR and 0.05% FWER, after accounting for the direct effects of all other variables.

Comparing our results to over 100 published studies of COVID-19 in UK Biobank, we
  • Recapitulated several commonly reported direct risk factors, e.g. age, sex, and obesity.
  • Excluded others, e.g. diabetes, cardiovascular disease, and hypertension, which might be mediated through other variables that measure general comorbidity.
  • Identified some infrequently reported direct risk factors, both individually, e.g. lung infection, and as groups, e.g. constipation/urinary tract infection, which might reflect underlying kidney disease.
The ability to test groups of variables, which increases sensitivity, was one of the benefits of Doublethink's model-averaging approach. It is particularly helpful in large biobanks that measure thousands of variables, because correlation between variables is pervasive, and can dilute the significance of individual variables that measure similar phenomena, like the numerous types of deprivation index. It serves as a flexible alternative to pre-analysis variable filtering algorithms, while controlling the risk of false positives by pre-defining significance thresholds for all possible tests.
To read more, please check out the preprints here and here.

No comments: