Downloading the FIGfams

In this tutorial we will show how to use the command-line server scripts to access the complete set of FIGfams.

To get a list of all the FIGfams and their functions, use

    svr_figfam_functions --all >figfam.functions.tbl

The output will be a tab-delimited file containing FIGfam IDs in the first column and the associated FIGfam function in the second column, as shown in the example below.

FIG000001       Cysteine desulfurase (EC
FIG000004       3-ketoacyl-CoA thiolase (EC @ Acetyl-CoA acetyltransferase (EC
FIG000011       Multimodular transpeptidase-transglycosylase (EC (EC 3.4.-.-)
FIG000013       Hydroxyacylglutathione hydrolase (EC
FIG000015       Signal peptidase I (EC
FIG000017       Peptide deformylase (EC
FIG000019       Octaprenyl-diphosphate synthase (EC 2.5.1.-) / Dimethylallyltransferase (EC / Geranyltranstransferase (farnesyldiphosphate synthase) (EC / G
eranylgeranyl pyrophosphate synthetase (EC
FIG000022       Shikimate 5-dehydrogenase I alpha (EC
FIG000023       Transaldolase (EC
FIG000025       Cell division protein FtsW
FIG000028       ATP-dependent Clp protease proteolytic subunit (EC
FIG000032       Dihydroorotase (EC
FIG000036       Methionine aminopeptidase (EC
FIG000038       Glucosamine--fructose-6-phosphate aminotransferase [isomerizing] (EC
FIG000039       Translation elongation factor Tu
FIG000040       Serine hydroxymethyltransferase (EC
FIG000043       Pyruvate kinase (EC
FIG000047       Ribulose-phosphate 3-epimerase (EC

To get the genes in each FIGfam instead of the function, use the svr_all_figfams command instead.

    svr_all_figfams >figfam.genes.tbl

The output will be a two-column tab-delimited file with the FIGfam ID in the first column and a gene ID in the second column.

FIG000001       fig|100226.1.peg.1021
FIG000001       fig|100226.1.peg.2112
FIG000001       fig|100226.1.peg.2123
FIG000001       fig|100226.1.peg.5438
FIG000001       fig|10090.3.peg.776
FIG000001       fig|101031.3.peg.10
FIG000001       fig|101031.3.peg.1382
FIG000001       fig|101031.3.peg.1936
FIG000001       fig|101031.3.peg.2058
FIG000001       fig|10116.3.peg.30469
FIG000001       fig|101510.15.peg.4139
FIG000001       fig|101510.15.peg.4409
FIG000001       fig|101510.15.peg.5738
FIG000001       fig|101510.15.peg.6515
FIG000001       fig|101510.15.peg.7968
FIG000001       fig|103690.1.peg.2812
FIG000001       fig|103690.1.peg.3395
FIG000001       fig|103690.1.peg.3517
FIG000001       fig|103690.1.peg.4174
FIG000001       fig|106370.11.peg.1664
FIG000001       fig|106370.11.peg.3080
FIG000001       fig|106370.11.peg.3607
FIG000001       fig|106370.11.peg.4426
FIG000001       fig|1085.1.peg.2604
FIG000001       fig|1085.1.peg.3303

To get the sequences of these genes, use the svr_figfam_fasta command. This command requires an input file containing the list of FIGfam IDs, but that can be created by piping in the output of svr_figfam_functions. Normally in a pipe situation the input is taken from the second column; the --c=1 command-line option indicates that the input will be taken from column 1 instead.

    svr_figfam_functions --all | svr_figfam_fasta --c=1 >figfams.fasta

Each FASTA entry will include the FIGfam ID as a comment, as shown in the output fragment below.

>fig|100226.1.peg.3361 FIG000171
>fig|101031.3.peg.1260 FIG000171