Adam Witney | 20 Feb 16:22 2013
Picon

Re: Problem Parsing BLAST output to annotate FASTA file


Hi Ann,

On 20/02/2013 05:20, Ann Gregory wrote:
> Hi BioPerl,
> 
> I am having issues with a BioPerl script. I have a blastxml file from a
> blastx blast and the original multifasta file containing the original
> nucleotides sequences.
> 
> I want to take the blast result (ie. the blast description) and annotate my
> multifasta file.
> 
> I have written 2 while loops that extract the blast descriptions as well as
> the nucleotide sequence from the multifasta file.
> 
> My problem is that I cannot incorporate one of the while loops into the
> other without loosing the loop property of one of the loops. I would like
> to take the 1st blast description, then the 1st nucleotide sequence, then
> the 2nd blast description, then the 2nd nucleotide sequence and so
> on...just can figure out how to alternate the results.
> 
> See script below:
> 
> 
> use warnings;
> use strict;
> use Bio::SearchIO;
> use Bio::SeqIO;
> 
> 
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> my $qd = $hit->description;
> print $qd, "\n";
> }
> }
> }
> 
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> while (my $seqobj = $seqio->next_seq) {
> my $nuc = $seqobj->seq();
> print $nuc, "\n";
> }--

I think what you are proposing assumes that the loop over the BLAST
results will come back in the same order as the loop over the Fasta
file, this may be the case, but I'm not sure its something I would rely on.

Anyway, I would loop over the BLAST results, storing the relevant data
to an array or hash and then loop over the fasta file to put the two
together. eg:

my $blast_data;

while ( ... blast data ... ) {
	...
	$blast_data->{$qd} = <whatever you want to store>
	...
}

while ( my $seqobj = $seqio->next_seq ) {
	my $id = $seqobj->id;
	print $blast_data->{$id}."\n";
}

something along those lines... or have i misunderstood you? if so can
you provide some more details, like what do you want your output to look
like?

HTH

Adam

Gmane