Features Download
From: Chris Larsen <clarsen <at> vecna.com>
Subject: Issue in Terminal Partial Codon Translation (1.6.0 ->1.6.1) ?
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Tuesday 5th March 2013 20:55:05 UTC (over 3 years ago)
Hello BioPerl-l,

Issue with partial codon translation. 

At our bioinformatics resource center we are using BioPerl to translate
partial sequences to amino acids. This is necessary in the case of certain
virus GenBank files, which are typically untranslated from polyprotein into
their final mature peptide format, and no amino acid sequence is given in
the source file. We need to make one from our mat_peptide generator.
However our developers are finding that in their migration from BioPerl
1.6.0 to 1.6.1, the final amino acid generated from a partial nucleotide
sequence is now being dropped, and this is resulting in several hundred
files being altered relative to legacy. Here is an example:

In the hepC virus genome : AB014488

The nucleotide sequence (only) is given:

caagctgtca tggacatggt ggcgggggcc cactggggag tcctagcggg ccttgcctac
tattccatgg tggggaactg ggctaaggtt ttgattgtga tgctactctt cgccggcgtt
gacgggcata cccgcgtgac ggggggggtg caaggccacg tcacctctac actcacgtcc
ctctttagac ctggggcgtc ccagaaaatt cagcttgtaa acaccaatgg cagttggcac
atcaacagga ctgccctgaa ctgcaatgac tccctccaaa ctgggttcct tgccgcgctg
ttctacacac acaagttcaa cgcgtccgga tgcccggagc gcatggccag ctgccgctcc
attgacaagt tcgaccaggg atggggtccc atcacttatg cccaacctga caactcggac
cagaggccgt attgctggca ctatgcacct cgacagtgtg gtatcgtacc cgcgtcgcag
gtgtgcggtc cagtgtattg cttcacccca agccctgttg tggtggggac gaccgatcgt
tccggtgccc ctacgtataa ctgggg

where this ends in the partial codon of 'gg'. (Dont bother, this is 188 and
2/3 aa). A biologist might know that this terminus is always going to be a
Glycine, G, since the third position is irrelevant, and so we would like to
extend the partial codon into another amino acid in the last 'E2' protein
encoded by this genome fragment. It's not sequenced but we can infer. The
viral proteins are so short, it really matters! We want that G (and
SPTAVRL). However the newer BioPerl version is not giving us the last amino
acid. The functionality appears to be turned off, or a default argument was
changed?  Issue with -complete ?

Sorry, not good quite enough in (bio)perl to find the solution myself. Only
know it is not working now, and am trying to prevent the DBA from stabbing
me in the neck with a spork because 4% of the records are now diff in the
new pipeline. (Partial seqs, exactly 2 of 3 bp,  non-stop terminus in the
available CDS). They are not telling me that errors don't occur within a
string, only at the terminus, in viral polyproteins, and only when is not a
stop codon. Ergo, color me confused.

I believe this is being handled by:

Bio::Tools::CodonTable and $obj->translate().

where the docs state the method:

"Returns a string of one letter amino acid codes from 
           nucleotide sequence input. The imput (sic) can be of any length.

if the codon is two nucleotides long and if by adding
               an a third character 'N', it codes for a single amino
               acid (with exceptions above), return that, otherwise
               return empty string."

But I defer to the much larger bioperl-l wisdom. Sorry for the complexity.
The options for them seem to be to continue using the 1.6.0 version to
generate the longer better string, and 1.6.1 for everything else; but I'd
rather the DBA team just uses one version of BioPerl. They could also
update their version to 1.6.9. But, migrating a whole industrial pipeline
to a new version for a production system also isnt trivial, so rather than
hoping blindly for the fix in v1.6.9, I am asking here if the functionality
works for others, and for which versions, and if its truly a bug, or has
been fixed or changed, and as of what version. It would be great if one
tiny piece could be replaced and the whole problem vanishes...But we may
find it is our problem too. 

Hoping we just have to turn on this changed functionality, but we also want
to see the issue documented. Cannot find the solution in BIO's docs. Sorry

I just want all the extra [SPTAVRLG] we can get. Any guidance I will
graciously convey back to the team and try to work it out.



PS: Chris F and Amir discuss : highly related issue: http://bioperl.org/pipermail/bioperl-l/2011-January/034401.html

"why should CodonTable::translate() automatically 'complete' the
translation for incomplete codons by default?  I would consider this a

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Medical
Phone: (240) 965-4525
Fax: (240) 547-6133

clarsen at vecna dot com

Better Technology, Better World (TM)

The contents of this message may be privileged and confidential. Therefore,
if this message has been received in error, please delete it. Your receipt
of this message is not intended to waive any applicable privilege. Please
do not disseminate this message without the permission of the author.
CD: 2ms