The SEED Servers: Coding Examples Archives

Some simple Sapling examples

By Robert Olson on November 16, 2010 4:14 PM

At a brief session at the last workshop we built several simple applications using the Sapling API.

The first is a simple replica of the svr_all_genomes script:

The second allows one to recall all the proteins in the vibrio genomes:

The third allows one to recall the proteins in a given fasta file, and compare to a file of annotations for the proteins in that genome:

compare_functions fasta-file function-file

The last does a potentially dubious translation of the hits from a myRAST metagenomics output run into proteins.

hits-to-protein dna-fasta hits-file genetic-code

Retrieving Features and Functions for a Genome

By The SEED Team on June 21, 2010 12:52 PM

Once you have a genome ID, there are numerous Sapling Server methods for processing the genes and features of the genome. In this article, we will show how to get all the protein sequences, features, and annotations of a single genome.

Continue reading Retrieving Features and Functions for a Genome.

Getting a List of Genomes and Their Taxonomies

By The SEED Team on June 21, 2010 10:20 AM

The Sapling Server has several methods for retrieving information about genomes. In this tutorial, we'll discuss how to get a list of all the genomes and pull out basic data and metrics.

Continue reading Getting a List of Genomes and Their Taxonomies.

Services to Support Annotation of Genes

By The SEED Team on June 3, 2010 2:39 PM

Identifying of Genes

If one builds an annotation pipeline, one of the first steps involves identifying the putative genes. Example 5 illustrates some basic functions that can be invoked via the servers to identify protein-encoding genes and rRNA-encoding genes, and tRNA-encoding genes. These services utilize tools made available by JCVI, Niels Larsen, Gary Olsen, and Sean Eddy. They offer reasonably accurate, easily-invoked services to locate genes in prokaryotic genomes.

Continue reading Services to Support Annotation of Genes.

Access to Functional Coupling (Conserved Contiguity) Data

By The SEED Team on June 3, 2010 2:38 PM

A great deal has been learned from studying genes that tend to occur close to one another in diverse genomes [PMID: 11471247 - change date to 1998, PMID: 9787636, PMID: 10077608, PMID: 11230160, PMID: 18712303].

Example 4 accesses the SEED server that offers access to the data we use to compute co-occurrence scores. The program illustrates the potential for constructing custom tools by going through all of the protein-encoding genes in all of the complete prokaryotic genomes maintained within the SEED looking for "hypothetical proteins" that tend to co-occur with genes encoding functions that can be connected to subsystems. The program constructs a table showing

the gene,
the function of the gene,
the genome id containing such a gene,
the description of the genome,
the non-hypothetical gene in a subsystem that appears to have the strongest co-occurrence score,
the co-occurrence score, and
the function assigned to the co-occurring gene contained in a subsystem.

We believe that there are many variations to this basic data mining capability that could be implemented on top of this basic co-occurrence data.

Continue reading Access to Functional Coupling (Conserved Contiguity) Data.

Creating Custom Interfaces

By The SEED Team on June 3, 2010 2:37 PM

Suppose that you had substantial expertise in graphical interfaces, understood the power of comparative analysis, and wished to support the ability to graphically display the chromosomal regions around a set of genes (normally from distinct genomes). The SEED offers one alternative for doing this (see the region displayed here for an example), but suppose that you did not like forcing users to find appropriate SEED IDs and you thought that you could develop a superior display.

Example 3 illustrates the functions required to determine the location of a SEED gene encoding a specific protein and to acquire the genes from a given region centered on that location. If you were to create a program to accept arbitrary protein IDs, use the conversion capabilities demonstrated in example1, and display the regions graphically around these genes, you would have the core of a useful tool. If you shaded genes from the same subsystem (determined using the capabilities described in example2), you could enhance the supported functionality. Of course, you could also compute which genes could be connected to literature or structures and encode that data as well.

Continue reading Creating Custom Interfaces.

Metabolic Reconstructions Provided for Complete Prokaryotic Genomes

By The SEED Team on June 3, 2010 2:35 PM

Given a set of functional roles, one often wishes to understand what subsystems can be inferred from the set. This example reads as input a set of functional roles and constructs a table of subsystems, along with their variation codes, that can be identified. The data displayed in this simple example could form the start of a research project to gather the functional roles not connected to subsystems, to determine whether they were not connected because a small set of functional roles were not present in the input, and to seek candidates for such "missing functional roles". The ability to easily map functional roles into subsystems will improve over time, as the SEED annotation effort improves its collection of encoded subsystems [PMID: 16214803].

Continue reading Metabolic Reconstructions Provided for Complete Prokaryotic Genomes.

Conversion of Gene and Protein IDs

By The SEED Team on June 3, 2010 2:33 PM

In example1 we illustrate some basic capabilities that relate to determining the set of IDs attached to specific protein sequences. The program accepts a protein ID as input. The ID may be one of several that are maintained by the SEED, UniProt, RefSeq, KEGG and other groups. The program first accesses all IDs attached to identical protein sequences. This can be a fairly large set in cases in which many very similar genomes have been sequenced.

Continue reading Conversion of Gene and Protein IDs.

Recently in Coding Examples Category

Some simple Sapling examples

Retrieving Features and Functions for a Genome

Getting a List of Genomes and Their Taxonomies

Services to Support Annotation of Genes

Identifying of Genes

Access to Functional Coupling (Conserved Contiguity) Data

Creating Custom Interfaces

Metabolic Reconstructions Provided for Complete Prokaryotic Genomes

Conversion of Gene and Protein IDs

Search

API Listings

SEED Utilities

About this Archive

Categories

Pages

Presentations