FIGfams
Each FIGfam is a set of proteins that are believed to be isofunctional homologs. That is, they all are believed to implement the same function, and they are believed to derive from a common ancestor because they appear to be similar. Given two members of a FIGfam, it should be the case that they can be globally aligned.
FIGfams are generated in two ways:
- They are derived from subsystems (the set of PEGs in a column that are globally similar becomes a FIGfam).
- We have tools that align closely-related genomes, and genes that appear to "clearly correspond to one another" are placed in the same FIGfam.
Note that there is no manual curation of FIGfams. They are automatically derived. The manual annnotation occurs within the subsystems. If errors are detected within a FIGfam, the correction is made by fixing a subsystem or creating a new subsystem -- causing the derivation process to produce improved FIGfams.
At this point, there are multiple FIGfam collections. The largest contain over 130,000 sets of proteins (of which about 50% of the sets contain only two sequences).
For a brief presentation on this subject, see this PDF.