Chris Larsen | 5 Mar 21:55 2013

Issue in Terminal Partial Codon Translation (1.6.0 ->1.6.1) ?

Hello BioPerl-l,

Issue with partial codon translation. 

At our bioinformatics resource center we are using BioPerl to translate partial sequences to amino acids.
This is necessary in the case of certain virus GenBank files, which are typically untranslated from
polyprotein into their final mature peptide format, and no amino acid sequence is given in the source
file. We need to make one from our mat_peptide generator. However our developers are finding that in their
migration from BioPerl 1.6.0 to 1.6.1, the final amino acid generated from a partial nucleotide sequence
is now being dropped, and this is resulting in several hundred files being altered relative to legacy.
Here is an example:

In the hepC virus genome : AB014488

The nucleotide sequence (only) is given:

caagctgtca tggacatggt ggcgggggcc cactggggag tcctagcggg ccttgcctac
tattccatgg tggggaactg ggctaaggtt ttgattgtga tgctactctt cgccggcgtt
gacgggcata cccgcgtgac ggggggggtg caaggccacg tcacctctac actcacgtcc
ctctttagac ctggggcgtc ccagaaaatt cagcttgtaa acaccaatgg cagttggcac
atcaacagga ctgccctgaa ctgcaatgac tccctccaaa ctgggttcct tgccgcgctg
ttctacacac acaagttcaa cgcgtccgga tgcccggagc gcatggccag ctgccgctcc
attgacaagt tcgaccaggg atggggtccc atcacttatg cccaacctga caactcggac
cagaggccgt attgctggca ctatgcacct cgacagtgtg gtatcgtacc cgcgtcgcag
gtgtgcggtc cagtgtattg cttcacccca agccctgttg tggtggggac gaccgatcgt
tccggtgccc ctacgtataa ctgggg

where this ends in the partial codon of 'gg'. (Dont bother, this is 188 and 2/3 aa). A biologist might know
that this terminus is always going to be a Glycine, G, since the third position is irrelevant, and so we
would like to extend the partial codon into another amino acid in the last 'E2' protein encoded by this
genome fragment. It's not sequenced but we can infer. The viral proteins are so short, it really matters!
We want that G (and SPTAVRL). However the newer BioPerl version is not giving us the last amino acid. The
functionality appears to be turned off, or a default argument was changed?  Issue with -complete ?

Sorry, not good quite enough in (bio)perl to find the solution myself. Only know it is not working now, and am
trying to prevent the DBA from stabbing me in the neck with a spork because 4% of the records are now diff in
the new pipeline. (Partial seqs, exactly 2 of 3 bp,  non-stop terminus in the available CDS). They are not
telling me that errors don't occur within a string, only at the terminus, in viral polyproteins, and only
when is not a stop codon. Ergo, color me confused.

I believe this is being handled by:

Bio::Tools::CodonTable and $obj->translate().

where the docs state the method:

"Returns a string of one letter amino acid codes from 
           nucleotide sequence input. The imput (sic) can be of any length. 
if the codon is two nucleotides long and if by adding
               an a third character 'N', it codes for a single amino
               acid (with exceptions above), return that, otherwise
               return empty string."

But I defer to the much larger bioperl-l wisdom. Sorry for the complexity. The options for them seem to be to
continue using the 1.6.0 version to generate the longer better string, and 1.6.1 for everything else; but
I'd rather the DBA team just uses one version of BioPerl. They could also update their version to 1.6.9.
But, migrating a whole industrial pipeline to a new version for a production system also isnt trivial, so
rather than hoping blindly for the fix in v1.6.9, I am asking here if the functionality works for others,
and for which versions, and if its truly a bug, or has been fixed or changed, and as of what version. It would
be great if one tiny piece could be replaced and the whole problem vanishes...But we may find it is our
problem too. 

Hoping we just have to turn on this changed functionality, but we also want to see the issue documented.
Cannot find the solution in BIO's docs. Sorry Brian!

I just want all the extra [SPTAVRLG] we can get. Any guidance I will graciously convey back to the team and try
to work it out.



PS: Chris F and Amir discuss : highly related issue:

"why should CodonTable::translate() automatically 'complete' the translation for incomplete codons
by default?  I would consider this a bug."



Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Medical
Phone: (240) 965-4525
Fax: (240) 547-6133

clarsen at vecna dot com

Better Technology, Better World (TM)

The contents of this message may be privileged and confidential. Therefore, if this message has been
received in error, please delete it. Your receipt of this message is not intended to waive any applicable
privilege. Please do not disseminate this message without the permission of the author.