Brian Foley | 19 Apr 01:17 2013
Picon

Annotation-assisted (and/or BLAST assisted) multiple sequence alignment tool?

At the HIV Sequence and Immunolgogy Databases (http://www.hiv.lanl.gov) 
where I work, we have
used a bit of creativity to solve some difficult problems in multiple 
sequence alignment, because we
often want to produce an alignment of gene sequences from more than 20,000 
different isolates of HIV-1
in less than a few minutes time.

We are very good at "deep" multiple alignment, thousands of copies of the 
same small genome.

My problem comes when I want to align the genomes of other viruses or 
similar sized gene
regions (the complete mitochondrial genomes of vertebrates for example, 
which are roughly 17 kb
in size), they don't always have the same gene order.

A good example are the mitochondrial genomes of birds and mammals, which 
are mostly
co-linear, but with the NADH6 gene moved to a different location.  See 
attached JPG of
Aardvark and Japanese Eagle-Hawk mitochondrial genomes.

In other cases, I think it is the primate mitochondrial genomes, the 
authors all used a different site for the "base #1" in 
the circular genome.  So although the primate mitochondrial genomes are 
100% co-linear with other vertebrates, we
have to chop several thousand bases off the right end and past them onto 
the left end (5' end, beginning) to make
them align with the mt-genomes of other mammals.

So, it seems to me that there ought to be a multiple sequence alignment 
tool, that can read GenBank files with
their annotation, and use the annotation to help with the alignment process.

One tool that I am aware of, which can help a lot, is the "Artemis Genome 
Comparison Tool" (ACT) and its
associated DOUBLE-ACT server:
http://www.hpa-bioinfotools.org.uk/pise/double_act.html

The DOUBLE-ACT server uses BLAST to find regions on a pair of genomes which 
are homologous/similar
and creates a table of these matched regions.  The Artemis Comparison Tool 
then loads both genomes
into an ARTEMIS Genome Browser tool and uses the BLAST hit table to help 
the browser get both genomes
"in synch" with each other as you browse the genomes.
Although the DOUBLE-ACT BLAST step here is not dependent on annotations at 
all, the annotations
are visible when browsing the genomes in ACT.

I am quite sure that I am not the only one in the world who needs this type 
of tool.  I am increasingly seeing
large multiple sequence alignments being done for classification of 
organisms, where the authors
could have used such a tool.

Please let me know if you have any ideas about where to look for such a 
tool, or which groups of
bioinformatics workers might be able to develop one.

Brian T. Foley, PhD
HIV Databases
Los Alamos National Laboratory
btf <at> lanl.gov
505 665-1970
_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Gmane