Example 4 accesses the SEED server that offers access to the data we use to compute co-occurrence scores. The program illustrates the potential for constructing custom tools by going through all of the protein-encoding genes in all of the complete prokaryotic genomes maintained within the SEED looking for "hypothetical proteins" that tend to co-occur with genes encoding functions that can be connected to subsystems. The program constructs a table showing
- the gene,
- the function of the gene,
- the genome id containing such a gene,
- the description of the genome,
- the non-hypothetical gene in a subsystem that appears to have the strongest co-occurrence score,
- the co-occurrence score, and
- the function assigned to the co-occurring gene contained in a subsystem.
Example 4 Discussion
With the server packages installed, the code in example 4 can be run as follows> perl server_paper_example4.plFirst we retrieve all complete geneomes from the Sapling database
my $genomeHash = $sapObject->all_genomes(-complete => 1);
Then we get all "Hypothetical" genes
my $geneHash = $sapObject->feature_assignments(-genome => $genome,
-type => 'peg');
my @hypotheticalGenes = grep { &SeedUtils::hypo($geneHash->{$_}) } sort keys %$geneHash;
Then we use the Functional Coupling Server to get a hash of all genes in the neighborhood of the hypothetical gene
my $couplingHash = $sapObject->conserved_in_neighborhood(-ids => \@hypotheticalGenes,
-hash => 1);
And finally, we process the coupling data for each hypothetical gene and format the output.
for my $gene (@hypotheticalGenes) {
my $couplingList = $couplingHash->{$gene};
if (defined $couplingList) {
my $subHash = $sapObject->ids_to_subsystems(-ids => [ map { $_->[1]} @$couplingList ]);
my ($bestCoupler, $bestScore, $bestFunction) = (undef, 0, '');
for my $coupling (@$couplingList) {
my ($score, $coupler, $function) = @$coupling;
if ($subHash->{$coupler} && $score > $bestScore) {
$bestCoupler = $coupler;
$bestScore = $score;
$bestFunction = $function;
}
}
if (defined $bestCoupler) {
print join("\t", $gene, $geneHash->{$gene}, $genome, $genomeName,
$bestCoupler, $bestScore, $bestFunction) . "\n";
}
}
}
Example 4 output (truncated)
fig|273035.4.peg.1016 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1018 31 DNA helicase
fig|273035.4.peg.1234 predicted metal-dependent hydrolase 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1233 80 Diacylglycerol kinase (EC 2.7.1.107)
fig|273035.4.peg.1496 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1491 9 Phage terminase large subunit
fig|273035.4.peg.1570 S1 RNA binding domain protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1571 10 Glucose-6-phosphate isomerase (EC 5.3.1.9)
fig|273035.4.peg.328 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.329 60 Putative tRNA-m1A22 methylase
fig|273035.4.peg.403 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.402 54 LSU ribosomal protein L27p
fig|273035.4.peg.60 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.61 9 hypothetical protein
fig|273035.4.peg.600 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.598 18 Recombination protein U homolog
fig|273035.4.peg.61 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.60 9 hypothetical protein
fig|273035.4.peg.710 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.708 12 LSU ribosomal protein L31p
.
.
.