Downloading the FIGfams

In this tutorial we will show how to use the command-line server scripts to access the complete set of FIGfams.

To get a list of all the FIGfams and their functions, use

    svr_figfam_functions --all >figfam.functions.tbl

The output will be a tab-delimited file containing FIGfam IDs in the first column and the associated FIGfam function in the second column, as shown in the example below.

FIG000001       Cysteine desulfurase (EC 2.8.1.7)
FIG000004       3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9)
FIG000011       Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-)
FIG000013       Hydroxyacylglutathione hydrolase (EC 3.1.2.6)
FIG000015       Signal peptidase I (EC 3.4.21.89)
FIG000017       Peptide deformylase (EC 3.5.1.88)
FIG000019       Octaprenyl-diphosphate synthase (EC 2.5.1.-) / Dimethylallyltransferase (EC 2.5.1.1) / Geranyltranstransferase (farnesyldiphosphate synthase) (EC 2.5.1.10) / G
eranylgeranyl pyrophosphate synthetase (EC 2.5.1.29)
FIG000022       Shikimate 5-dehydrogenase I alpha (EC 1.1.1.25)
FIG000023       Transaldolase (EC 2.2.1.2)
FIG000025       Cell division protein FtsW
FIG000028       ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92)
FIG000032       Dihydroorotase (EC 3.5.2.3)
FIG000036       Methionine aminopeptidase (EC 3.4.11.18)
FIG000038       Glucosamine--fructose-6-phosphate aminotransferase [isomerizing] (EC 2.6.1.16)
FIG000039       Translation elongation factor Tu
FIG000040       Serine hydroxymethyltransferase (EC 2.1.2.1)
FIG000043       Pyruvate kinase (EC 2.7.1.40)
FIG000047       Ribulose-phosphate 3-epimerase (EC 5.1.3.1)

To get the genes in each FIGfam instead of the function, use the svr_all_figfams command instead.

    svr_all_figfams >figfam.genes.tbl

The output will be a two-column tab-delimited file with the FIGfam ID in the first column and a gene ID in the second column.

FIG000001       fig|100226.1.peg.1021
FIG000001       fig|100226.1.peg.2112
FIG000001       fig|100226.1.peg.2123
FIG000001       fig|100226.1.peg.5438
FIG000001       fig|10090.3.peg.776
FIG000001       fig|101031.3.peg.10
FIG000001       fig|101031.3.peg.1382
FIG000001       fig|101031.3.peg.1936
FIG000001       fig|101031.3.peg.2058
FIG000001       fig|10116.3.peg.30469
FIG000001       fig|101510.15.peg.4139
FIG000001       fig|101510.15.peg.4409
FIG000001       fig|101510.15.peg.5738
FIG000001       fig|101510.15.peg.6515
FIG000001       fig|101510.15.peg.7968
FIG000001       fig|103690.1.peg.2812
FIG000001       fig|103690.1.peg.3395
FIG000001       fig|103690.1.peg.3517
FIG000001       fig|103690.1.peg.4174
FIG000001       fig|106370.11.peg.1664
FIG000001       fig|106370.11.peg.3080
FIG000001       fig|106370.11.peg.3607
FIG000001       fig|106370.11.peg.4426
FIG000001       fig|1085.1.peg.2604
FIG000001       fig|1085.1.peg.3303

To get the sequences of these genes, use the svr_figfam_fasta command. This command requires an input file containing the list of FIGfam IDs, but that can be created by piping in the output of svr_figfam_functions. Normally in a pipe situation the input is taken from the second column; the --c=1 command-line option indicates that the input will be taken from column 1 instead.

    svr_figfam_functions --all | svr_figfam_fasta --c=1 >figfams.fasta

Each FASTA entry will include the FIGfam ID as a comment, as shown in the output fragment below.

>fig|100226.1.peg.3361 FIG000171
MSAPFAQGPSDPTVQPVPASVIEQVDAADTTLSNPKRAVVALGANLGNRLETLQGAIDAL
EDTPGVRVKGVSPVYETEPWGVAPDSQPSYFNAVVILKTTLPPSSLLERAHAVEEAFHRV
RDERWGPRTLDVDIVAYAEVVSDDPHLTLPHPRAHERAFVLAPWLDVDPEAALPGRGRVA
DLLAAVTRDGVAPRADLELQLPE
>fig|101031.3.peg.1260 FIG000171
MKDVYLSIGTNMGERYENLQQAVALLREKENIEVVRVSSVYETAAVGYTDQADFLNIAVH
LKTDASSTEMLKICQSIEQELGRVREFRWGPRIIDLDILLYNQENIETENLLVPHPRMYE
RAFVLVPLVEITPAPFGDQLQQAHHLLQQMDCEREGINLWQPTNEPLIVS