Getting Summaries of Functional Content and OTUs for an Metagenomic Sample

(Note: in order to use the svr commands, you must have installed the myRAST app and set your environment correctly, see this post for instructions)

It is worth mentioning that two of the svr functions provide a means

of getting quick summaries of content for a newly-sequenced metagenomic sample.

svr_assign_to_dna_using_figfams < MG.sample

takes as input a set of DNA sequences in fasta format. It outputs a 5-column, tab-separated table containing:

The ID of one of the sequences
The number of Kmer hits against the sequence
The region identified as potentially supporting the function (in the form of a contig, begin, and end coordinates separated by underscores),
The function associated with the region (which may just be "hypothetical protein"),
A genome name that represents an "operational taxonomic unit" that appears to be the source of the hit.

This tab-separated table can be summarized using

svr_summarize_MG_output < table > function.summary 2> otu.summary

Normally, these are just pipelined using

svr_assign_to_dna_using_figfams < MG.sample | svr_summarize_MG_output > function.summary 2> otu.summary

The pipeline will usually process roughly 6-8 megabases of data per minute.

Finally, you can use

svr_metabolic_reconstruction < function.summary | cut -f 4,5 | sort -u

to get a quick metabolic reconstruction summarizing the active subsystems that could be determined (along with the appropriate variant code).