2. Command Line Services

If you don't want to write Perl programs but would like to use the SEED servers to process your data, we supply a number of predefined shell scripts that provide basic bioinformatics functions using the servers. These scripts are all prefaced with "svr_" and are found in the bin directory of the distribution. These are designed to use stdin and stdout and to be piped together to form more complex processing. 

If you are a MAC or Linux user, these scripts are accessed from the command line in your terminal shell where you must put myRAST in your path, like this:

export PATH=$PATH:/Applications/myRAST.app/bin

If you are a windows user, you must use the myRAST shell, which is installed with myRAST.

The svr scripts can be directed to use the SEED or the PSEED by the use of the environmental variable SAS_SERVER. It defaults to the SEED server, but if you want your scripts to access the PSEED, you would set the shell variable SAS_SERVER to PSEED, like this, using bash shell

export SAS_SERVER=PSEED

or like this if you are a windows user

set SAS_SERVER=PSEED



As a short example of using these scripts,  to get a list of all  genomes, you could do this at the command line:

svr_all_genomes complete
This would produce a two column table of all genomes (all complete genomes if you use the "complete" argument) in the SEED or PSEED. The first column is the genome name, and the second is the id, like this:

Berardius bairdii 48742.1
Simian immunodeficiency virus 11723.1
Erythrobacter litoralis HTCC2594 314225.3
Bacteriophage N15 40631.1
Bacillus cereus plasmid pPER272 1396.18
Cyanophage P-SSP7 268748.3
Enterococcus faecium plasmid pEF1 1352.12
Lactococcus lactis subsp. lactis Il1403 272623.1
Salmonella enterica subsp. enterica serovar Newport str. SL254 423368.6
Cotton leaf curl Rajasthan virus 223259.1



Here are examples of a number of basic functions using the servers that can be run from the command line and piped together to create small systems. These should serve as models for others who wish to create their own custom bioinformatics systems using the servers.

Find all features for a genome.

Simply retrieving all the features for a given genome is often the first step in an analysis sequence. This command is designed to be issued at the command line and takes as arguments a genome id and a feature type. The output is a single column table containing feature ids, suitable for piping into subsequent commands.

svr_all_features genome_id feature_type

This script returns all features of the specified type for the specified genome.

The code for this server is here. The man page is here.

Here is an example of running this command:

> svr_all_features 3702.1 peg
fig|3702.1.peg.1
fig|3702.1.peg.2
fig|3702.1.peg.3
fig|3702.1.peg.4
fig|3702.1.peg.5
fig|3702.1.peg.6
fig|3702.1.peg.7
fig|3702.1.peg.8
fig|3702.1.peg.9
fig|3702.1.peg.10
fig|3702.1.peg.11
fig|3702.1.peg.12
fig|3702.1.peg.13
fig|3702.1.peg.14
fig|3702.1.peg.15
.
.
.

Find Gene Function

Given a set of protein-encoding genes, a next step might be to retrieve the assigned function for each gene. This command takes as input a single column table of gene ids and returns a tab-separated two column table of gene id and function.

svr_function_of < table_of_gene_ids

Note that this script uses stdin and stdout and is designed to be part of a processing pipeline.

The code for this server is here. The man page is here.

Here is an example of running this command:

> svr_all_features 3702.1 peg | svr_function_of
fig|3702.1.peg.1 photosystem II protein D1 (PsbA)
fig|3702.1.peg.2 maturase
fig|3702.1.peg.3 SSU ribosomal protein S16p, chloroplast
fig|3702.1.peg.4 Photosystem II protein PsbK
fig|3702.1.peg.5 Photosystem II protein PsbI
fig|3702.1.peg.6 ATP synthase alpha chain (EC 3.6.3.14)
fig|3702.1.peg.7 ATP synthase CF0 B chain
fig|3702.1.peg.8 ATP synthase C chain (EC 3.6.3.14)
fig|3702.1.peg.9 ATP synthase CF0 A chain
fig|3702.1.peg.10 SSU ribosomal protein S2p (SAe), chloroplast
fig|3702.1.peg.11 DNA-directed RNA polymerase delta (= beta'') subunit (EC 2.7.7.6), chloroplast
fig|3702.1.peg.12 DNA-directed RNA polymerase gamma subunit (EC 2.7.7.6), chloroplast
fig|3702.1.peg.13 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), chloroplast
fig|3702.1.peg.14 Cytochrome b6-f complex subunit VIII (PetN)
fig|3702.1.peg.15 Photosystem II protein PsbM
.
.
.

Find Gene Aliases

Instead of function, perhaps you wish to see all the aliases by which a given feature or set of features is known in the SEED. You would use this command that behaves just like svr_function_of except it returns aliases:

svr_aliases_of < table_of_gene_ids

Note that this script uses stdin and stdout and is designed to be part of a processing pipeline. 

The code for this server is here. The man page is here.

Here is an example of running this command:

> svr_all_features 3702.1 peg | svr_aliases_of
fig|3702.1.peg.1 gi|112382048,gi|113200888,gi|114054364,gi|114107113,gi|114329726,gi|115531894,gi|134286292,gi|134286378,
gi|134286553,gi|134286643,gi|134286733,gi|134286999,gi|139387232,gi|139389076,gi|139389398,gi|139389623,gi|139389781,gi|13938993
1,gi|156597939,gi|156598592,gi|157695865,gi|159792928,gi|159793098,gi|159895452,gi|159895537,gi|166344112,gi|167391785,gi|169142
690,gi|169142840,gi|169142925,gi|169143011,gi|169794053,gi|6723714,sp|A4QJR4,sp|A4QJZ9,sp|A4QKH2,sp|A4QKR1,sp|A4QL00,sp|A4QLR3,s
p|B0Z4K6,sp|B0Z4U0,sp|B0Z524,sp|B0Z5A8,sp|B1A915,sp|B1NWD0,sp|P83755,sp|P83755,sp|P83756,sp|P83756,sp|Q06FY1,sp|Q09G66,sp|Q0G9Y2
,tr|A4QJZ9,tr|A4QKH2,tr|A4QL00,tr|A4QLR3,tr|A9QAZ4,tr|A9QAZ4,tr|A9QBW0,tr|A9QBW0,tr|Q06FY1,tr|Q09G66,tr|Q0G9Y2,fig|3702.1.peg.1,
gi|515374,gi|5881674,gi|7525013,fig|85636.1.peg.1,gi|13518299
fig|3702.1.peg.2 gi|12002371,gi|12002415,gi|12002417,gi|12002419,gi|12002421,gi|12002423,gi|12002425,gi|12002427,gi|12002
429,gi|12002431,gi|126022795,gi|5881675
fig|3702.1.peg.3 sp|P56806,sp|P56806,gi|5881676,gi|7525015
fig|3702.1.peg.4 sp|P56782,sp|P56782,fig|3702.1.peg.4,gi|5881677,gi|7525016
.
.
.

Find Neighbors

Beyond the basics of finding aliases and function, a more advanced analysis might require finding the PEGs that are in the neighborhood of a given PEG. This command takes as input a tab-separated table where the last field in each line contains the PEG for which a list of neighbors is being requested. It takes an argument telling how many neighbors to find to the left and right. The output file is the input file with an extra column appended at the end (containing a list of neighbors).

svr_neighbors_of n < table_of_gene_ids

Note that this script uses stdin and stdout and is designed to be part of a processing pipeline. 

The code for this server is here. The man page is here.

Here is an example of running this command:

> svr_all_features 3702.1 peg | svr_neighbors_of 5
fig|3702.1.peg.1 fig|3702.1.peg.2,fig|3702.1.peg.3,fig|3702.1.peg.4,fig|3702.1.peg.5,fig|3702.1.peg.6
fig|3702.1.peg.2 fig|3702.1.peg.1,fig|3702.1.peg.3,fig|3702.1.peg.4,fig|3702.1.peg.5,fig|3702.1.peg.6,fig|3702.1.peg.7
fig|3702.1.peg.3 fig|3702.1.peg.1,fig|3702.1.peg.2,fig|3702.1.peg.4,fig|3702.1.peg.5,fig|3702.1.peg.6,fig|3702.1.peg.7,fi
g|3702.1.peg.8
fig|3702.1.peg.4 fig|3702.1.peg.1,fig|3702.1.peg.2,fig|3702.1.peg.3,fig|3702.1.peg.5,fig|3702.1.peg.6,fig|3702.1.peg.7,fi
g|3702.1.peg.8,fig|3702.1.peg.9
fig|3702.1.peg.5 fig|3702.1.peg.1,fig|3702.1.peg.2,fig|3702.1.peg.3,fig|3702.1.peg.4,fig|3702.1.peg.6,fig|3702.1.peg.7,fi
g|3702.1.peg.8,fig|3702.1.peg.9,fig|3702.1.peg.10
.
.
.

2 Comments

Hi,
Where can I find the bin directory for the distribution? I followed the Mac OS installation instructions, I am running Mac OSX 10.4.11, and I could not find a FIG related bin anywhere after installation. Thanks!