Getting all IDs, Aliases and Assertions of Function for One or more Protein sequences

When trying to assess what function a protein sequence implements, it
is often useful to just gather the set of assertions that both FIG and
other individuals and groups have made.  This observation led us to
create the Annotation Clearinghouse.  One can ask to see all of the
assignments made by any group to a protein having the same sequence.




From the Command Line

Use something like

    svr_ach_lookup < ids

where the file ids contains one or more identifiers.  For example,
if ids contains just 

<<<<< start of ids >>>>>    [ That is, a file containing just a single line ]
gi|1786185
<<<<< end of ids >>>>>

the output produced is a 5-column table containing

   1. The incoming ID.
   2. The ID of the gene whose assertion was found in the database.
   3. The text of the assertion.
   4. The source of the assertion, usually a user name or institution identifier.
   5. A flag that is TRUE if the assertion is by a human expert, else FALSE.

In our simple 1-line input above, we would get back

<<<<<< start out output >>>>>>>>
gi|1786185 cmr|NT01EC0003 threonine synthase CMR 0
gi|1786185 cmr|NT11EC0003 threonine synthase CMR 0
gi|1786185 gi|169887502 threonine synthase gb 1
gi|1786185 gi|169887502 threonine synthase NCBI 0
gi|1786185 kegg|ecd:ECDH10B_0004 threonine synthase KEGG 0
gi|1786185 kegg|ecd:ECDH10B_0004 threonine synthase KEGG 1
gi|1786185 kegg|ecj:JW0003 threonine synthase KEGG 0
gi|1786185 kegg|ecj:JW0003 threonine synthase KEGG 1
gi|1786185 kegg|eco:b0004 threonine synthase (EC:4.2.3.1) KEGG 0
gi|1786185 kegg|eco:b0004 threonine synthase (EC:4.2.3.1) KEGG 1
gi|1786185 sp|P00934 RecName: Full=Threonine synthase NCBI 0
gi|1786185 sp|P00934 Threonine synthase UniProt 0
gi|1786185 fig|316385.5.peg.3 Threonine synthase (EC 4.2.3.1) SEED 0 Escherichia coli str. K-12 substr. DH10B
gi|1786185 gi|170079667 threonine synthase NCBI 0 Escherichia coli str. K-12 substr. DH10B
gi|1786185 gi|170079667 threonine synthase ref 1 Escherichia coli str. K-12 substr. DH10B
gi|1786185 fig|316407.3.peg.4 Threonine synthase (EC 4.2.3.1) SEED 0 Escherichia coli W3110
gi|1786185 fig|511145.6.peg.3 Threonine synthase (EC 4.2.3.1) SEED 0 Escherichia coli str. K-12 substr. MG1655
gi|1786185 AP_000668.1 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 fig|83333.1.peg.4 Threonine synthase (EC 4.2.3.1) SEED 0 Escherichia coli K12
gi|1786185 gi|135813 RecName: Full=Threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|135813 RecName: Full=Threonine synthase sp 1 Escherichia coli K12
gi|1786185 gi|147981 threonine synthase gb 1 Escherichia coli K12
gi|1786185 gi|147981 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|16127998 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|16127998 threonine synthase ref 1 Escherichia coli K12
gi|1786185 gi|1786185 threonine synthase gb 1 Escherichia coli K12
gi|1786185 gi|1786185 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|21321894 threonine synthase dbj 1 Escherichia coli K12
gi|1786185 gi|21321894 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|537247 threonine synthase gb 1 Escherichia coli K12
gi|1786185 gi|537247 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|71041761 Chain A, Crystal Structure Of Threonine Synthase From Escherichia Coli NCBI 0 Escherichia coli K12
gi|1786185 gi|71041761 Chain A, Crystal Structure Of Threonine Synthase From Escherichia Coli pdb 1 Escherichia coli K12
gi|1786185 gi|89106888 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 gi|89106888 threonine synthase ref 1 Escherichia coli K12
gi|1786185 NP_414545.1 threonine synthase NCBI 0 Escherichia coli K12
gi|1786185 uni|P00934 Threonine synthase SwissProt 1 Escherichia coli K12
<<<<<< end of output >>>>>>>>>

Note that this returns assertions from numerous sources, and there is
certainly no fixed format for the assertions -- we have just tried to
gather them as they are made available from the sources.  In a few
cases, those making the assertions felt that they were pretty reliable
and called them "expert assertions".  You should take those
particularly seriously.


From a Program

Suppose that you wish to retrieve the assertions we have gathered from
a program -- presumably so that you could format them beautifully for
the user or some such reason.  Here is a short program that does the
equivalent of svr_ach_lookup:

<<<<<< beginning of perl script >>>>>>>>>>

use strict;
use SeedEnv;

my $sap     = SAPserver->new;
my @ids     = map { $_ =~ /^(\S+)/; $1 } <STDIN>;
my $id_hash = $sap->equiv_sequence_assertions(-ids => \@ids);
foreach     my $id (keys(%$id_hash))
{
    my $assertions = $id_hash->{$id};
    foreach my $tuple (@$assertions)
    {   
        my($other_id,$other_func,$source,$expert,$precise_organism) =
@$tuple;
        if (! defined($precise_organism)) { $precise_organism = "" }
        print
join("\t",($id,$other_id,$other_func,$source,$expert,$precise_organism)),"\n";
    }   
}

<<<<<< end of perl script >>>>>>>>>>

The behavior of this simple version is not precisely that of
svr_ach_lookup.  The lines will not be sorted in the same order, for
example.  However, it does illustrate exactly how to programmatically
extract data from the Annotation Clearinghouse.