A Simple Way to Check for Genes that Bridge Contigs


It is common in sequencing projects to reach a point where you have a few contigs,
and it can become a chore to "span the gaps" with directed
sequencing.  People frequently ask

      "Can you help us find genes such
       that one portion of the gene is at the end of one contig, and the
       other end of the gene occurs on the end of another contig?"

We suggest trying something like

   svr_just_ends < contigs | svr_assign_to_dna_using_figfams | svr_possible_joins

The first tool, svr_just_ends, takes as input a set of contigs.  It
writes out sequences that are the two ends of each contig (by default,
the tool takes 1000 characters from each end).

These fragments of the original contigs are passed to

      svr_assign_to_dna_using_figfams

which is a tool that attempts to detect genes that implement specific
functions in the DNA fragments passed to it.  It outputs possible
protein-encoding regions, along with the function that is believed to
be encoded.

The third tool just constructs a short report of the hits that appear
to be against genes playing the same function.  The hits are sorted
so that those hitting genes implementing the same function are sorted
into groups.

It is simple and fast, but it is not a complete solution.

We do not offer a more complete solution, since in most cases closing
the genome will require a good deal more than just spanning contigs
with recognizable genes -- it often boils down to spanning repeat
regions, which is a far more complex topic.