Jason Stajich | 2 Aug 09:57 2013

Re: blast and length adjustment

On Aug 1, 2013, at 9:01 PM, dimitark <at> bii.a-star.edu.sg wrote:

> Hi guys,
> i have a question about Blast.
> I was working on some project where i blast using Bioperl against the human-RNA. So i found 2 sequences
which hit on totally different RNAs but when i used cd-hit-est they cluster together. I even aligned them
and they were almost identical, from NCBI aligner:
> 2658 bits(1439) 	0.0 	1441/1442(99%) 	0/1442(0%) 	Plus/Plus
> Then i decided to blast them on NCBI and they again hit on different sequences.
> Then i checked the parameters of each search and found that both queries were length adjusted aka some
length was removed, namely around 30 nucleotides.
> Well it was interesting to see what bioperl does about that so i found the following in BlastUtils.pm:
>   # Adjust length based on BLAST flavor.
>    my $prog = $sbjct->algorithm;
>    if($prog eq 'TBLASTN') {
> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>    } elsif($prog eq 'BLASTX' ) {
> 	$sbjct->{'_length_aln_query'} /= 3;
>    } elsif($prog eq 'TBLASTX') {
> 	$sbjct->{'_length_aln_query'} /= 3;
> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>    }

You are wrongly interpreting the length adjustment that happens at NCBI with this length adjustment. The
code above is to deal with translated searches - notice they all are division by 3 because the coordinates
presented in the BLAST results for a translated search will be the original DNA/RNA coords but when wants
to know what the length is in the alignment space it is really at the protein scale.

So this is not the adjustment you seem to be looking for.
> But seems there is no length adjustment for blastn as it seems to exist on NCBI.
> Its kind of frustrating as i am trying to do some differential expression analysis with my own scripts. But
then if these 2 seqs are so identical they should have the same annotation but they do not cos of that strange
blast results.

No idea what you mean by the rest of this when it comes to your candidate RNA sequences or what you are seeking
to find from the BLAST searches to help you on that front.
> I am really sorry if my post is a bit messy. If you have any questions on what i meant please ask.
> Any comments would be greatly appreciated!
> Cheers
> D.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich <at> gmail.com
jason <at> bioperl.org