Frequently Asked Questions

Q: How does MetaQuery work?

A: MetaQuery estimates the abundance of a query sequence across 1,267 publicly available fecal metagenomes from human subjects.

The workflow is as follows:

  1. The user enters one or more protein sequences in FASTA format. These sequences are searched against the integrated catalog of reference genes in the human gut microbiome (IGC) (Li, et al., 2014) using BLAST (Altschul, et al., 1990). The IGC is composed of 9.9 million genes that originate from microbial reference genomes assembled from sequencing of isolates and metagenomes.
  2. Homologs of the query sequence are identified in the IGC based on the BLAST alignments and the set of alignment parameters entered by the user. These parameters include maximum E-value and minimum percent identity (%ID). Because over 40% of the genes in the IGC lack either start/stop codons (Li, et al., 2014), many alignments will fail to globally cover both the query and target sequence. Therefore, we enforce a minimum 70% glocal alignment coverage threshold defined as: max(Laln/Lquery, Laln/Ltarget), where Laln is the alignment length, Lquery is the length of the query, and Ltarget is the length of the target.
  3. Next, we obtain the relative abundances of identified homologs from a precomputed abundance matrix built by (Li, et al., 2014). This matrix consists of relative abundances of 9.9 million genes across 1,267 metagenomic samples, where the relative abundance of genes is scaled to sum to 1.0 per-sample. For each query, we sum the relative abundances of all identified homologs for each sample.
  4. Optionally, our software normalizes gene relative abundances using a panel of 30 universal single-copy genes (Nayfach and Pollard, 2015). The result of this normalization is a metric called Average Genomic Copy Number, which represents the estimated average copy number of a gene across microbial cells (Manor and Borenstein, 2015). Without normalization, the resulting metric is Relative Abundance, which is scaled to sum to 1.0 across all genes for a sample.

Q: What are the outputs of MetaQuery?

A: MetaQuery outputs include figures and tables.

Q: Does MetaQuery save my input data?

A: No, MetaQuery does not save any user inputs. The MetaQuery outputs are retained for 24 hours in order to enable users to download them. Outputs are deleted after 24 hours.

Q: What are the best alignment parameters to use?

A: This depends on whether you are interested in close or remote homologs of your query. For close homologs, use high percent identity cutoffs (e.g. 90, 95, 98%) and/or low E-value cutoffs. For remote homologs, use a lower percent identity cutoff and/or higher E-value cutoff. The default values may be too lenient for your application. You can also run MetaQuery using several cutoffs and compare the results.

Q: What does "average copy number" mean, and how does MetaQuery estimate this?

A: This is an abundance metric for a gene or gene family. It indicates the average number of gene copies per cell in a microbial community. It is obtained by normalizing gene abundances by the abundance of a group of universal single copy genes. So, a value of 1.0 indicates that a gene is present once per cell on average; a value of 0.01 as present once per 100 cells on average.

Q: How do I cite MetaQuery?

A: If you use MetaQuery, please use the following citation:
Nayfach S, Fischbach MA, Pollard KS. MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome. Bioinformatics 2015;31(14). doi:10.1093/bioinformatics/btv382
Also, be sure to cite the various resources, studies, and tools utilized by MetaQuery. These references can be found on the About page.