MetaQuery

About

Documentation on metagenomes, reference genes, and annotation pipeline

Overview:

Microbiome researchers frequently want to know the abundance of a particular microbial gene, pathway, or species across different human hosts and its association with disease. While there are now thousands of publicly available metagenomes from the human gut, computational barriers prevent most researchers from conducting such analyses. MetaQuery is a web application that enables rapid and quantitative analysis of specific genes, functions and taxa across 1,267 publicly available human gut metagenomes. These data span several continents (Europe, China, North America) and disease states (IBD, diabetes, obesity, rheumatoid arthritis, colorectal cancer, and liver cirrhosis). The speed and accessibility of MetaQuery are a step toward democratizing metagenomics research, which should allow many researchers to query the abundance and variation of specific genes in the human gut microbiome.

You can read more about MetaQuery here:
Nayfach S, Fischbach MA, Pollard KS. MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome. Bioinformatics 2015;31(14). doi:10.1093/bioinformatics/btv382

MetaQuery web development team at Gladstone Institutes:
Chunyu Zhao, Stephen Nayfach, Ayushi Agrawal, Andrew Davis, Alexander R. Pico, Katherine S. Pollard

How it works:

Align query genes to the microbiome gene catalog: Query genes specified by the user are aligned to the integrated catalog of reference genes in the human gut microbiome using BLAST. More details about the gene catalog are provided below.
Identify homologous microbiome genes: Gene sequences similar to any of the user's queries are identified in the gene catalog. Based on specified alignment parameters, the user can choose to target genes that are highly similar (e.g. >95% identity) or more distantly related (e.g. >30% identity).
Quantify abundance of homologs: The abundance of all 9.9 million microbiome genes is precomputed for 1,267 samples. This enables MetaQuery to rapidly look up gene abundances for identified homologs. Gene abundances are defined as the average gene copies per cell. Read below for more information about how gene abundances are estimated.

Flowchart for estimating the abundance a query gene in the human gut.

Microbiome gene catalog:

count genes in catalog	9879900.0
% complete genes	57.74
% from reference genomes	2.46
% annotated at phylum level	21.31
% annotated at genus level	16.3
% annotated by kegg	42.1
% annotated by eggNOG	60.44

Statistics on 9.9M reference genes
For more information, see:

Li, J., et al. An integrated catalog of reference genes in the human gut microbiome. Nature biotechnology 2014;32(8):834-841. doi:10.1038/nbt.2942
Arumugam, M., et al. Supporting data for the paper: "An integrated catalog of reference genes in the human gut microbiome". GigaScience Database 2014 doi:10.5524/100064
Integrated reference catalog website: http://meta.genomics.cn/metagene/meta/home

Public metagenomes used:

Reference	giga_bases	sequencing_runs	samples	subjects
Human Microbiome Project Consortium. The framework for human microbiome research. Nature 2012;486(7402):215-221. doi:10.1038/nature11209	3771.27	1576	337	180
Le Chatelier, E., et al. Richness of human gut microbiome correlates with metabolic markers. Nature 2013;500(7464):541-546. doi:10.1038/nature12506	1641.29	595	292	292
Li, J., et al. An integrated catalog of reference genes in the human gut microbiome. Nature biotechnology 2014;32(8):834-841. doi:10.1038/nbt.2942	1641.29	595	292	292
Nielsen, H.B., et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nature biotechnology 2014;32(8):822-828. doi:10.1038/nbt.2939	1455.11	1704	396	318
Zhang, X., et al. The oral and gut microbiomes are perturbed in rheumatoid arthritis and partly normalized after treatment. Nature Medicine 2015;21(8):895-905. doi:10.1038/nm.3914	1454.22	232	232	232
Qin, J., et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 2012;490(7418):55-60. doi:10.1038/nature11450	1295.18	365	365	365
Qin, N., et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 2014;513(7516):59-64. doi:10.1038/nature13568	1207.97	314	237	237
Feng, Q., et al. Gut microbiome development along the colorectal adenoma-carcinoma sequence. Nat Commun. 2015;6:6528. doi:10.1038/ncomms7528	778.91	312	156	156
Karlsson, F.H., et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 2013;498(7452):99-103. doi:10.1038/nature12198	460.08	147	145	145
Qin, J., et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010;464(7285):59-65. doi:10.1038/nature08821	402.17	264	124	124
Total	13937.8	6088	2271	1962

Statistics on human gut metagenomes used by MetaQuery
All metagenomes were downloaded from the NCBI Sequence Read Archive.
Datasets were identified using the aid of SRAdb (doi:10.1186/1471-2105-14-19)
FASTQC was used to filter low-quality metagenomes with: read-length <50-bp, average read-quality <20, or >2% of reads with adaptor contamination.

Mapping reads and quantifying genes:

Reads from high-quality metagenomes were mapped to genes from the integrated gene catalog using Bowtie2 (doi:10.1038/nmeth.1923) with settings: --sensitive-local. Alignments with <70% nucleotide identity or where the read was covered by <80% of its length were discarded. The read-depth of each reference gene was quantified based on mapped reads, and these values were normalized by the median read-depth across a panel of 30 universal single copy genes (doi:10.1371/journal.pone.0077033). The resulting statistic, average genomic copy number is an estimate of the average number of copies of the gene per cell. We also estimated gene relative abundance, obtained by scaling abundances to sum to 1.0 across genes per sample. Gene abundances were aggregated across technical replicate datasets where applicable.

Estimating the abundance of functional and taxonomic groups:

The abundances of taxonomic groups were estimated for all high-quality metagenomes using MetaPhlAn2 (doi:10.1038/nmeth.3589) and mOTU (doi:10.1038/nmeth.2693). The abundances of functional groups were estimated using the KEGG (doi:10.1093/nar/28.1.27) and eggNOG (doi: 10.1093/nar/gkr1060) databases. Reference gene annotations were obtained from the GigaScience database (doi: 10.5524/100064).

Statistical tests to identify biomarkers:

Wilcoxon-rank-sum tests are used to identify genes, functions, and taxa that are differentially abundant between cases and controls. MetaQuery performs these tests on cohorts from the same country and excludes individuals with comorbidities or with known drug treatment (e.g. metformin). Gene abundances are averaged across samples from the same individual.