dimitark | 2 Aug 06:01 2013

blast and length adjustment

Hi guys,
i have a question about Blast.

I was working on some project where i blast using Bioperl against the  
human-RNA. So i found 2 sequences which hit on totally different RNAs  
but when i used cd-hit-est they cluster together. I even aligned them  
and they were almost identical, from NCBI aligner:

2658 bits(1439) 	0.0 	1441/1442(99%) 	0/1442(0%) 	Plus/Plus

Then i decided to blast them on NCBI and they again hit on different  
Then i checked the parameters of each search and found that both  
queries were length adjusted aka some length was removed, namely  
around 30 nucleotides.

Well it was interesting to see what bioperl does about that so i found  
the following in BlastUtils.pm:

    # Adjust length based on BLAST flavor.
     my $prog = $sbjct->algorithm;
     if($prog eq 'TBLASTN') {
	$sbjct->{'_length_aln_sbjct'} /= 3;
     } elsif($prog eq 'BLASTX' ) {
	$sbjct->{'_length_aln_query'} /= 3;
     } elsif($prog eq 'TBLASTX') {
	$sbjct->{'_length_aln_query'} /= 3;
	$sbjct->{'_length_aln_sbjct'} /= 3;

But seems there is no length adjustment for blastn as it seems to  
exist on NCBI.

Its kind of frustrating as i am trying to do some differential  
expression analysis with my own scripts. But then if these 2 seqs are  
so identical they should have the same annotation but they do not cos  
of that strange blast results.

I am really sorry if my post is a bit messy. If you have any questions  
on what i meant please ask.

Any comments would be greatly appreciated!