Access to Functional Coupling (Conserved Contiguity) Data

A great deal has been learned from studying genes that tend to occur close to one another in diverse genomes [PMID: 11471247 - change date to 1998, PMID: 9787636, PMID: 10077608, PMID: 11230160, PMID: 18712303].

Example 4 accesses the SEED server that offers access to the data we use to compute co-occurrence scores. The program illustrates the potential for constructing custom tools by going through all of the protein-encoding genes in all of the complete prokaryotic genomes maintained within the SEED looking for "hypothetical proteins" that tend to co-occur with genes encoding functions that can be connected to subsystems. The program constructs a table showing

  1. the gene,
  2. the function of the gene,
  3. the genome id containing such a gene,
  4. the description of the genome,
  5. the non-hypothetical gene in a subsystem that appears to have the strongest co-occurrence score,
  6. the co-occurrence score, and
  7. the function assigned to the co-occurring gene contained in a subsystem.
We believe that there are many variations to this basic data mining capability that could be implemented on top of this basic co-occurrence data.



Example 4 Discussion

With the server packages installed, the code in example 4 can be run as follows
> perl server_paper_example4.pl 
First we retrieve all complete geneomes from the Sapling database
            my $genomeHash = $sapObject->all_genomes(-complete => 1);

Then  we get all "Hypothetical" genes

	    my $geneHash = $sapObject->feature_assignments(-genome => $genome,
-type => 'peg');
my @hypotheticalGenes = grep { &SeedUtils::hypo($geneHash->{$_}) } sort keys %$geneHash;

Then we use the Functional Coupling Server to get a hash of all genes in the neighborhood of the hypothetical gene

	    my $couplingHash = $sapObject->conserved_in_neighborhood(-ids => \@hypotheticalGenes,
-hash => 1);

And finally, we process the coupling data for each hypothetical gene and format the output.

	    for my $gene (@hypotheticalGenes) {
my $couplingList = $couplingHash->{$gene};
if (defined $couplingList) {
my $subHash = $sapObject->ids_to_subsystems(-ids => [ map { $_->[1]} @$couplingList ]);
my ($bestCoupler, $bestScore, $bestFunction) = (undef, 0, '');
for my $coupling (@$couplingList) {
my ($score, $coupler, $function) = @$coupling;
if ($subHash->{$coupler} && $score > $bestScore) {
$bestCoupler = $coupler;
$bestScore = $score;
$bestFunction = $function;
}
}
if (defined $bestCoupler) {
print join("\t", $gene, $geneHash->{$gene}, $genome, $genomeName,
$bestCoupler, $bestScore, $bestFunction) . "\n";
}
}
}

Example 4 output (truncated)

fig|273035.4.peg.1016	hypothetical protein	273035.4	Spiroplasma kunkelii CR2-3x	fig|273035.4.peg.1018	31	DNA helicase
fig|273035.4.peg.1234 predicted metal-dependent hydrolase 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1233 80 Diacylglycerol kinase (EC 2.7.1.107)
fig|273035.4.peg.1496 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1491 9 Phage terminase large subunit
fig|273035.4.peg.1570 S1 RNA binding domain protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.1571 10 Glucose-6-phosphate isomerase (EC 5.3.1.9)
fig|273035.4.peg.328 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.329 60 Putative tRNA-m1A22 methylase
fig|273035.4.peg.403 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.402 54 LSU ribosomal protein L27p
fig|273035.4.peg.60 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.61 9 hypothetical protein
fig|273035.4.peg.600 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.598 18 Recombination protein U homolog
fig|273035.4.peg.61 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.60 9 hypothetical protein
fig|273035.4.peg.710 hypothetical protein 273035.4 Spiroplasma kunkelii CR2-3x fig|273035.4.peg.708 12 LSU ribosomal protein L31p
.
.
.