Downloading a Genome

In this tutorial we will show how to use the command-line server scripts to download all the genes for a genome.

To get a list of all the features in a genome, use svr_genome_functions and specify the genome ID. The following command lists all the features for genome 360108.3.

    svr_genome_functions 360108.3 >genes.tbl

The output will be a tab-delimited file, with the FIG feature ID in the first column, a location string in the second column, and the text of the functional assignment in the third.

fig|360108.3.peg.1      360108.3:NZ_AANK01000008_3178-720       Putative periplasmic ATP /GTP-binding protein
fig|360108.3.peg.10     360108.3:NZ_AANK01000008_9127+210       Putative H-T-H containing protein
fig|360108.3.peg.100    360108.3:NZ_AANK01000005_2781+978       Putative periplasmic protein
fig|360108.3.peg.1000   360108.3:NZ_AANK01000002_231249+804     Zinc ABC transporter, inner membrane permease protein ZnuB
fig|360108.3.peg.1001   360108.3:NZ_AANK01000002_232319-279     hypothetical protein
fig|360108.3.peg.1002   360108.3:NZ_AANK01000002_233051-711     DNA modification methyltransferase (EC 2.1.1.-)
fig|360108.3.peg.1003   360108.3:NZ_AANK01000002_233515-468     DNA-cytosine methyltransferase (EC 2.1.1.37)
fig|360108.3.peg.1004   360108.3:NZ_AANK01000002_234102-564     Cytosine-specific DNA methyltransferase
fig|360108.3.peg.1005   360108.3:NZ_AANK01000002_234878-780     MnlI restriction endonuclease
fig|360108.3.peg.1006   360108.3:NZ_AANK01000002_235273-387     hypothetical protein
fig|360108.3.peg.1007   360108.3:NZ_AANK01000002_235696-423     COG0779: clustered with transcription termination protein NusA
fig|360108.3.peg.1008   360108.3:NZ_AANK01000002_236048-363     Ribosome-binding factor A
fig|360108.3.peg.1009   360108.3:NZ_AANK01000002_238627-2583    Translation initiation factor 2
fig|360108.3.peg.101    360108.3:NZ_AANK01000005_3759+987       Alanine racemase (EC 5.1.1.1)
fig|360108.3.peg.1010   360108.3:NZ_AANK01000002_238811-198     Hypothetical protein Cj0135
fig|360108.3.peg.1011   360108.3:NZ_AANK01000002_239774-879     Homoserine kinase (EC 2.7.1.39)

The svr_fasta command can be used to get the DNA for the genes in FASTA form. Normally, svr_fasta takes its input from the last column. Since the FIG ID is in the first column of the output from svr_genome_functions, we use the --c=1 option on the command line.

    svr_genome_functions 360108.3 | svr_fasta --c=1 --fasta

The location string and function will be included as comments on the FASTA output, as shown below.

>fig|360108.3.peg.1 360108.3:NZ_AANK01000008_3178-720 Putative periplasmic ATP /GTP-binding protein
ttgttagaatttgtgtttataattcttattttaggtatagtttttaatttaggtagtctt
tacttaaaaaaagacaatttactagaaggcgcaatacaaattcttaatgatatccaatat
acccaaagtttagccatgatgcaagaaggtataagagttgatgagttggctatcgcaaaa
agagagtggtttaaaagtaggtggcagatttattttataaaatcagctgccacaggttat
gatcaaacatatactattttcttggataaaaatggcgatggaaatgctaatttaggtaaa
actgaaatcaatatagatagagaaattgctgttgatgtaatcaatcataacaaattaatg
aattcaggtcaaagtggagttattagtaaagatgatgaaaaaactacacaaagatttaat
cttacaaaaagattcggaatagaaaaggttgaattcaaaggatcttgttcaggatttact
agattagtatttgatgaaatgggcagagtatattctccgttaaaaaatgccaattatgcc
tatgaaaaaactttagcaaagaataattcagattgcattatacgtttgttatcaaaaaag
catgctctttgtatcgttatagatacgcttagtggttatgcttatattccagattttaaa
acacttaaaagtcaatttgttaatataaaaaataaaaattacgagtgctctaaaatataa
>fig|360108.3.peg.10 360108.3:NZ_AANK01000008_9127+210 Putative H-T-H containing protein
atgacaaaaaagagcaagcgtgatatggcatatgaactagacattgatgttagcacttta
tataattggagaaaatataaacccaatttatatcgtattgttatgcttggctttaaattt
gatgaacttttagaaaatagcaaaaaaactcatgaagaattgcttcatatagaacaaact
atacaagatgagattgctaaatttaaataa
>fig|360108.3.peg.100 360108.3:NZ_AANK01000005_2781+978 Putative periplasmic protein
ttgttaaaacgacttgctttattaattacactttcttcattgatattgcatgcctcagat
cttgttaaaatttatcttaatcaaggattagatgctgttggtgtagcgattgaaaaagaa
ttaactcaaaagggtttttggttaagtgaaataggagataaaaatatttcacttggatac
tatgatgataatgttgctatcgtgcttacaaataaaacagataaaattcttcgtgtttat
tcttatgaggacggaaaaataagaaaagattttgaacaaaaagaaataataactggatta
atgggcgataaaaaaatagaaggagatttgaaaacgccagtaggtttttatgagttaggc
cgtaagtttaacccaggtgatccttattatggtccctttgcttttgctacaacttatcca
aatttacttgataaagtacaaggaaaaacaggtggtggtatttggatccatggctatcct
ttagatggttctagacttgatgaatttaaaacaagagggtgtattgctttatttaataat
aatttggagaaatttgcacaagttgtacaagataaaaaagtttttgttatgacagaagaa
aaagaaaaaattagagctaaaaaggatcaaatagctagtttattggctgatctttttaca
tggaaactagcttggacaaatagtgacactaatacctatttaagtttttatgatgagcaa
gaatttaaacgttttgataaaatgaaatttgaacagtttgcttccatgaaaaaatctatt
ttttctcgtaaagaagataaaaagattaaattttcagatattaatatcagcccttatccg
aatttagaaaatgaaactatgtatagaatttcattttatgaggattattacactaaaaac
tatcagtttagaggcgataaaattttatacgttaagatagatagtaaaggtaaaatgaaa
attttagcagagcaataa

About this Entry

This page contains a single entry by The SEED Team published on August 11, 2010 1:27 PM.

Downloading the FIGfams was the previous entry in this blog.

Downloading a Subsystem is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 5.02