Services to Support Annotation of Genes

Identifying of Genes

If one builds an annotation pipeline, one of the first steps involves identifying the putative genes. Example 5 illustrates some basic functions that can be invoked via the servers to identify protein-encoding genes and rRNA-encoding genes, and tRNA-encoding genes. These services utilize tools made available by JCVI, Niels Larsen, Gary Olsen, and Sean Eddy. They offer reasonably accurate, easily-invoked services to locate genes in prokaryotic genomes.



Assigning Functions to Encoded Proteins

Once genes have been identified, the next step usually relates to making initial estimates of function for the products of the protein-encoding genes. Example 6 reads a fasta file of protein sequences and generates initial estimates of function. There are two levels of service offered: the first is a very fast technique that can assign functions to most proteins that have been placed in FIGfams. The second, much slower technique, involves invoking BLAT [PMID: 11932250] to seek an estimate of function against prokaryotic genes that have not been placed in FIGfams. Both techniques are far more rapid than the use of BLAST [PMID: 2231712], but they are also miss some similarities BLAST detects.

Creation of a Metabolic Reconstruction

Example 7 illustrates how to create an initial metabolic reconstruction from the assignments generated via a technique like those illustrated in example6. It uses functionality already demonstrated in example2. The program takes the detected functions, composes a set of functional roles (splitting multi-functional function assignments into the atomic functional roles), and accesses the inferred set of subsystems.