In this tutorial we will show how to use the command-line server scripts to access the complete set of FIGfams.
To get a list of all the FIGfams and their functions, use
svr_figfam_functions --all >figfam.functions.tbl
The output will be a tab-delimited file containing FIGfam IDs in the first column and the associated FIGfam function in the second column, as shown in the example below.
FIG000001 Cysteine desulfurase (EC 2.8.1.7) FIG000004 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9) FIG000011 Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-) FIG000013 Hydroxyacylglutathione hydrolase (EC 3.1.2.6) FIG000015 Signal peptidase I (EC 3.4.21.89) FIG000017 Peptide deformylase (EC 3.5.1.88) FIG000019 Octaprenyl-diphosphate synthase (EC 2.5.1.-) / Dimethylallyltransferase (EC 2.5.1.1) / Geranyltranstransferase (farnesyldiphosphate synthase) (EC 2.5.1.10) / G eranylgeranyl pyrophosphate synthetase (EC 2.5.1.29) FIG000022 Shikimate 5-dehydrogenase I alpha (EC 1.1.1.25) FIG000023 Transaldolase (EC 2.2.1.2) FIG000025 Cell division protein FtsW FIG000028 ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92) FIG000032 Dihydroorotase (EC 3.5.2.3) FIG000036 Methionine aminopeptidase (EC 3.4.11.18) FIG000038 Glucosamine--fructose-6-phosphate aminotransferase [isomerizing] (EC 2.6.1.16) FIG000039 Translation elongation factor Tu FIG000040 Serine hydroxymethyltransferase (EC 2.1.2.1) FIG000043 Pyruvate kinase (EC 2.7.1.40) FIG000047 Ribulose-phosphate 3-epimerase (EC 5.1.3.1)
To get the genes in each FIGfam instead of the function, use the svr_all_figfams command instead.
svr_all_figfams >figfam.genes.tbl
The output will be a two-column tab-delimited file with the FIGfam ID in the first column and a gene ID in the second column.
FIG000001 fig|100226.1.peg.1021 FIG000001 fig|100226.1.peg.2112 FIG000001 fig|100226.1.peg.2123 FIG000001 fig|100226.1.peg.5438 FIG000001 fig|10090.3.peg.776 FIG000001 fig|101031.3.peg.10 FIG000001 fig|101031.3.peg.1382 FIG000001 fig|101031.3.peg.1936 FIG000001 fig|101031.3.peg.2058 FIG000001 fig|10116.3.peg.30469 FIG000001 fig|101510.15.peg.4139 FIG000001 fig|101510.15.peg.4409 FIG000001 fig|101510.15.peg.5738 FIG000001 fig|101510.15.peg.6515 FIG000001 fig|101510.15.peg.7968 FIG000001 fig|103690.1.peg.2812 FIG000001 fig|103690.1.peg.3395 FIG000001 fig|103690.1.peg.3517 FIG000001 fig|103690.1.peg.4174 FIG000001 fig|106370.11.peg.1664 FIG000001 fig|106370.11.peg.3080 FIG000001 fig|106370.11.peg.3607 FIG000001 fig|106370.11.peg.4426 FIG000001 fig|1085.1.peg.2604 FIG000001 fig|1085.1.peg.3303
To get the sequences of these genes, use the svr_figfam_fasta command. This command requires an input file containing the list of FIGfam IDs, but that can be created by piping in the output of svr_figfam_functions. Normally in a pipe situation the input is taken from the second column; the --c=1
command-line option indicates that the input will be taken from column 1 instead.
svr_figfam_functions --all | svr_figfam_fasta --c=1 >figfams.fasta
Each FASTA entry will include the FIGfam ID as a comment, as shown in the output fragment below.
>fig|100226.1.peg.3361 FIG000171 MSAPFAQGPSDPTVQPVPASVIEQVDAADTTLSNPKRAVVALGANLGNRLETLQGAIDAL EDTPGVRVKGVSPVYETEPWGVAPDSQPSYFNAVVILKTTLPPSSLLERAHAVEEAFHRV RDERWGPRTLDVDIVAYAEVVSDDPHLTLPHPRAHERAFVLAPWLDVDPEAALPGRGRVA DLLAAVTRDGVAPRADLELQLPE >fig|101031.3.peg.1260 FIG000171 MKDVYLSIGTNMGERYENLQQAVALLREKENIEVVRVSSVYETAAVGYTDQADFLNIAVH LKTDASSTEMLKICQSIEQELGRVREFRWGPRIIDLDILLYNQENIETENLLVPHPRMYE RAFVLVPLVEITPAPFGDQLQQAHHLLQQMDCEREGINLWQPTNEPLIVS