Listing All Genomes
The all_genomes method returns a complete list of the genomes in the database, returning a reference to a hash that maps each genome ID to its scientific name. The following code gets the hash and prints it out.
use SeedEnv; my $sapObject = SAPserver->new(); my $genomeHash = $sapObject->all_genomes(); for my $genomeID (sort keys %$genomeHash) { print "$genomeID: $genomeHash->{$genomeID}\n"; }
The initial output from the above program looks like this:
100226.1: Streptomyces coelicolor A3(2) 100226.8: Streptomyces coelicolor A3(2) plasmid SCP1 100226.9: Streptomyces coelicolor A3(2) plasmid SCP2 100379.3: Onion yellows phytoplasma NIM plasmid extrachromosomal DNA 100379.4: Onion yellows phytoplasma plasmid EcOYW1 100379.5: Onion yellows phytoplasma plasmid pOYM 100379.6: Onion yellows phytoplasma plasmid pOYNIM
The -complete
option can be used in the all_genomes call to return only complete genomes, as follows.
use SeedEnv; my $sapObject = SAPserver->new(); my $genomeHash = $sapObject->all_genomes(-complete => 1); for my $genomeID (sort keys %$genomeHash) { print "$genomeID: $genomeHash->{$genomeID}\n"; }
This also eliminates the plasmids, as you can see from the output fragment below.
100226.1: Streptomyces coelicolor A3(2) 10090.3: Mus musculus (House mouse) 101031.3: Bacillus B-14905 10116.3: Rattus norvegicus (Norway rat) 101510.15: Rhodococcus jostii RHA1 103690.1: Nostoc sp. PCC 7120 106370.11: Frankia sp. Ccl3
Once you have genome IDs, there are numerous things you can do to get more information. The following program prints a full taxonomy for each complete genome in the system.
use SeedEnv; my $sapObject = SAPserver->new(); my $genomeHash = $sapObject->all_genomes(-complete => 1); my $taxHash = $sapObject->taxonomy_of(-ids => [keys %$genomeHash]); for my $genomeID (sort keys %$genomeHash) { print "$genomeID: " . join(", ", @{$taxHash->{$genomeID}}) . "\n"; }
In the above fragment, the keys of the initial genome hash specify the list of genomes whose taxonomies are desired. The taxonomy_of method computes the taxonomies and puts them in $taxHash
in the form of lists so that they can be printed by the for
loop. The output looks like this.
00226.1: Bacteria, Actinobacteria, Actinobacteria (class), Actinobacteridae, Actinomycetales, Streptomycineae, Streptomycetaceae, Streptomyces, Streptomyces coelicolor, Streptomyces coelicolor A3(2) 10090.3: Eukaryota, Fungi/Metazoa group, Metazoa, Eumetazoa, Bilateria, Coelomata, Deuterostomia, Chordata, Craniata, Vertebrata, Gnathostomata, Teleostomi, Euteleostomi, Sarcopterygii, Tetrapoda, Amniota, Mammalia, Theria, Eutheria, Euarchontoglires, Glires, Rodentia, Sciurognathi, Muroidea, Muridae, Murinae, Mus, Mus musculus 101031.3: Bacteria, Firmicutes, Bacilli, Bacillales, Bacillaceae, Bacillus, Bacillus sp. B-14905
If full taxonomy information is excessive, you can ask for just the domain using genome_domain.
use SeedEnv; my $sapObject = SAPserver->new(); my $genomeHash = $sapObject->all_genomes(-complete => 1); my $domHash = $sapObject->genome_domain(-ids => [keys %$genomeHash]); for my $genomeID (sort keys %$genomeHash) { print "$genomeID: $domHash->{$genomeID}\n"; }
100226.1: Bacteria 10090.3: Eukaryota 101031.3: Bacteria 10116.3: Eukaryota 101510.15: Bacteria 103690.1: Bacteria