Microbial Metagenomics Mock Scenario-based Sample Simulation (M3S3) Academic Article uri icon


  • Objectives Shotgun sequencing is increasingly applied in clinical microbiology for unbiased culture-independent diagnosis. While software solutions for metagenomics proliferate, integration of metagenomics in clinical care requires method standardization and validation. Virtual metagenomics samples could underpin validation by substituting real samples and thus we sought to develop a novel solution for simulation of metagenomics samples based on user-defined clinical scenarios. Methods We designed the Microbial Metagenomics Mock Scenario-based Sample Simulation (M 3 S 3 ) workflow, which allows users to generate virtual samples from raw reads or assemblies. The M 3 S 3 output is a mock sample in FASTQ or FASTA format. M 3 S 3 was tested by generating virtual samples for 10 challenging infectious disease scenarios, involving a background matrix ‘spiked' in silico with pathogens including mixtures. Replicate samples (seven per scenario) were used to represent different compositional ratios. Virtual samples were analysed using Taxonomer and Kraken db. Results The 10 challenge scenarios were successfully applied, generating 80 samples. For all tested scenarios, the virtual samples showed sequence compositions as predicted from the user input. Spiked pathogen sequences were identified with the majority of the replicates and most exhibited acceptable abundance (deviation between expected and observed abundance of spiked pathogens), with slight differences observed between software tools. Conclusions Despite demonstrated proof-of-concept, integration of clinical metagenomics in routine microbiology remains a substantial challenge. M 3 S 3 is capable of producing virtual samples on-demand, simulating a spectrum of clinical diagnostic scenarios of varying complexity. The M 3 S 3 tool can therefore support the development and validation of standardized metagenomics applications.

publication date

  • January 1, 2017