Recently in Coding Examples Category
Once you have a genome ID, there are numerous Sapling Server methods for processing the genes and features of the genome. In this article, we will show how to get all the protein sequences, features, and annotations of a single genome.
Identifying of Genes
If one builds an annotation pipeline, one of the first steps involves identifying the putative genes. Example 5 illustrates some basic functions that can be invoked via the servers to identify protein-encoding genes and rRNA-encoding genes, and tRNA-encoding genes. These services utilize tools made available by JCVI, Niels Larsen, Gary Olsen, and Sean Eddy. They offer reasonably accurate, easily-invoked services to locate genes in prokaryotic genomes.Example 4 accesses the SEED server that offers access to the data we use to compute co-occurrence scores. The program illustrates the potential for constructing custom tools by going through all of the protein-encoding genes in all of the complete prokaryotic genomes maintained within the SEED looking for "hypothetical proteins" that tend to co-occur with genes encoding functions that can be connected to subsystems. The program constructs a table showing
- the gene,
- the function of the gene,
- the genome id containing such a gene,
- the description of the genome,
- the non-hypothetical gene in a subsystem that appears to have the strongest co-occurrence score,
- the co-occurrence score, and
- the function assigned to the co-occurring gene contained in a subsystem.
Example 3 illustrates the functions required to determine the location of a SEED gene encoding a specific protein and to acquire the genes from a given region centered on that location. If you were to create a program to accept arbitrary protein IDs, use the conversion capabilities demonstrated in example1, and display the regions graphically around these genes, you would have the core of a useful tool. If you shaded genes from the same subsystem (determined using the capabilities described in example2), you could enhance the supported functionality. Of course, you could also compute which genes could be connected to literature or structures and encode that data as well.