Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Jason Stajich <jason.stajich <at> gmail.com>
Subject: Re: blast and length adjustment
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Friday 2nd August 2013 07:57:06 UTC (over 3 years ago)
On Aug 1, 2013, at 9:01 PM, [email protected] wrote:

> Hi guys,
> i have a question about Blast.
> 
> I was working on some project where i blast using Bioperl against the
human-RNA. So i found 2 sequences which hit on totally different RNAs but
when i used cd-hit-est they cluster together. I even aligned them and they
were almost identical, from NCBI aligner:
> 
> 2658 bits(1439) 	0.0 	1441/1442(99%) 	0/1442(0%) 	Plus/Plus
> 
> Then i decided to blast them on NCBI and they again hit on different
sequences.
> Then i checked the parameters of each search and found that both queries
were length adjusted aka some length was removed, namely around 30
nucleotides.
> 
> Well it was interesting to see what bioperl does about that so i found
the following in BlastUtils.pm:
> 
>   # Adjust length based on BLAST flavor.
>    my $prog = $sbjct->algorithm;
>    if($prog eq 'TBLASTN') {
> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>    } elsif($prog eq 'BLASTX' ) {
> 	$sbjct->{'_length_aln_query'} /= 3;
>    } elsif($prog eq 'TBLASTX') {
> 	$sbjct->{'_length_aln_query'} /= 3;
> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>    }

You are wrongly interpreting the length adjustment that happens at NCBI
with this length adjustment. The code above is to deal with translated
searches - notice they all are division by 3 because the coordinates
presented in the BLAST results for a translated search will be the original
DNA/RNA coords but when wants to know what the length is in the alignment
space it is really at the protein scale.

So this is not the adjustment you seem to be looking for.
> 
> But seems there is no length adjustment for blastn as it seems to exist
on NCBI.
> 
> Its kind of frustrating as i am trying to do some differential expression
analysis with my own scripts. But then if these 2 seqs are so identical
they should have the same annotation but they do not cos of that strange
blast results.

No idea what you mean by the rest of this when it comes to your candidate
RNA sequences or what you are seeking to find from the BLAST searches to
help you on that front.
> 
> I am really sorry if my post is a bit messy. If you have any questions on
what i meant please ask.
> 
> Any comments would be greatly appreciated!
> 
> Cheers
> D.
> 
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
[email protected]
[email protected]
 
CD: 18ms