Subsystems
Subsystems
The use of subsystems as a key technology for annotation of genomes
was introduced in The
Annotate 1000 Genomes. We recommend reading this paper for a
detailed discussion.
A subsystem is a set of functional roles that together implement a
specific biological process or structural complex. A
subsystem may be thought of as generalization of the term
pathway. Thus, just as glycolysis is composed of a set of functional
roles (glucokinase, glucose-6-phosphate isomerase and
phosphofuctokinase, etc.) a complex like the ribosome or a transport
system can be viewed as a collection of functional roles. In practice,
we put no restriction on how curators select the set of functional
roles they wish to group into a subsystem, and we find subsystems
being created to represent the set of functional roles that make up
pathogenicity islands, prophages, transport cassettes and complexes
(although many of the existing subsystems do correspond to metabolic
pathways). The concept of populated subsystem is an extension of the
basic notion of subsystem—it amounts to a subsystem along with a
spreadsheet depicting the exact genes that implement the functional
roles of the subsystem in specific genomes. The populated subsystem
specifies which organisms include operational variants of the
subsystem and which genes in those organisms implement the functional
roles that make up the subsystem. Each column in the spreadsheet
corresponds to a functional role from the subsystem, each row
represents a genome, and each cell identifies the genes within the
genome that encode proteins which implement the specific functional
role within the designated genome.
At this point (August, 2010), over 1200 subsystems have been
constructed, containing over 11,000 distinct functional roles and
1,400,000 PEGs (genes). Many of these subsystems have been
"experimental" in the sense that they were constructed to support
specific hypotheses and then not maintained. As many as a third of
the collection fall into this category.
See The Project to Annotate 1000 Genomes for our manifesto written in 2004 describing a basic strategy for creating a framework to support high-throughput annotation. See also this PDF of a presentation on the subject.