Whole genome sequencing in near-to-real time is set to become a routine tool for outbreak detection by hospital and public health microbiology labs, following successful pilot studies in the UK last year. Typically, the bacteria are cultured from a clinical sample, and a single colony is picked for sequencing. Since a bacterial colony grows from a single cell, this procedure ensures that all the cells picked for sequencing are genetically identical, and this in turn helps piece the genome back together again following sequencing.
But it exposes the system to a flaw. What would happen if a patient sick with two strains transmitted one, but not the other to a second patient? Characterizing the genome of just one of the strains in the first patient risks missing the transmission event entirely, because the "wrong" strain might have been sequenced.
One safeguard would be to sequence multiple bacterial colonies per sample, three for example. But this would increase the cost of routine surveillance three-fold.
In a new paper published this month in PLoS Computational Biology, with David Eyre, Madeleine Cule, Sarah Walker and others, we have investigated an alternative solution, where by a large number of colonies gets sequenced all together. The cost is the same as that of sequencing a single colony. But the downstream bioinformatics analysis is complicated considerably by the presence of multiple strains. To cope with this, we developed a new computational method that reconstructs the identities of the multiple strains, using a panel of reference genomes to help where possible.
By applying the approach to 26 clinical samples of Clostridium difficile hospital infections with known epidemiological relationships, we detected four mixed strain infections, one of which revealed a previously undetected transmission event within the hospital. For full details, read the open access paper.