21 Nov 2012 00:50
Re: get_Stream_by_query() appears to return empty stream
Brian Osborne <bosborne11 <at> verizon.net>
2012-11-20 23:50:00 GMT
2012-11-20 23:50:00 GMT
Felix, I took a look at the Bio::DB::Query::GenBank documentation, it says this: If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method. Bio::DB::Genbank queries "nucleotide", by default. You have GeneIDs. I see that you're setting "-id" to "gene" but note that you're passing that query to a plain Bio::DB::GenBank object. Not sure what the expected behavior is here. I would try using the NCBI Eutilities for that second query, rather than Bio::DB::Query::GenBank (http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook). Brian O. On Nov 1, 2012, at 8:01 PM, Felix Horns <rfhorns <at> gmail.com> wrote: > Hello everyone. > > I am having trouble using the get_Stream_by_query() function > in Bio::DB::GenBank. It seems to return an empty stream, such that > $stream->next_seq never returns anything. > > However, $query->count is returning the expected value (139). Also, > get_Stream_by_query() seems to be querying the database, as when I pass it > an array of GeneIDs that have not been properly formatted, i.e. > GeneID:7816864, instead of simply 7816864, it returns warnings and errors: > "MSG: Warning(s) from GenBank: <PhraseNotFound>GeneID 7817709...; MSG: > Error from Genbank: No items found.". > > I have included my full code below. I have also included the output from > the code below that. The code is intended to find genes located within a > genomic region. I will later find the protein domains and pathways that > those genes are involved in. > > Any help would be greatly appreciated. I realize that this is probably a > very simple question, but I am relatively new to BioPerl and I've spent the > better part of the day trying to figure out such issues, so I would be very > thankful for help. > > Felix > > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::DB::EntrezGene; > use Bio::DB::GenBank; > > # Load reference sequence > # Load from local .gb file > # Note that .gb file does not include sequences > # my $gbfile = "NC_012660.1.gb"; > # my $seqio = Bio::SeqIO->new(-file => $gbfile); > # my $ref_seq = $seqio->next_seq; > > # To access reference sequence programatically, uncomment this code > my $gb = new Bio::DB::GenBank; > my $ref_seq = $gb->get_Seq_by_acc("NC_012660.1"); > > # Specify coordinates of gap > my $gap_start = 2050506; > my $gap_end = 2190530; > > my $gene_count = 0; > my <at> features; > my <at> starts; > my <at> ends; > my <at> db_xrefs; > > my <at> products; > my <at> protein_ids; > > # Get gene features in gap > for my $feat ($ref_seq->get_SeqFeatures) { > my $start=$feat->location->start; > my $end=$feat->location->end; > > if (($feat->primary_tag eq 'gene') & > ($gap_start < $start) & ($start < $gap_end) & > ($gap_start < $end) & ($end < $gap_end)) { > > $gene_count += 1; > > # Get GeneID reference > my $db_xref = ($feat->get_tag_values('db_xref'))[0]; > $db_xref =~ s/GeneID://; # Trim "GeneID:" from start of $db_xref > > push <at> features, $feat; > push <at> starts, $start; > push <at> ends, $end; > push <at> db_xrefs, $db_xref; > } > } > > # Get data about gene features from GeneID reference > my $query = Bio::DB::Query::GenBank->new(-db => 'gene', > -ids => [ <at> db_xrefs]); > my $stream = $gb->get_Stream_by_query($query); > > while (my $seq = $stream->next_seq) { > for my $feat ($seq->all_SeqFeatures) { > print "primary tag: ", $feat->primary_tag, "\n"; > for my $tag ($feat->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > } > > print $query->count,"\n"; > print $gene_count, "\n"; > > > OUTPUT >> perl analyze_gap.pl > 139 > 139 > > Note that no "primary tag; tag; value" items are printed. Furthermore, > when I put a print line immediately after the (while (my $seq = > $stream->next_seq)) statement, it was never called, seemingly indicating > that the stream is empty. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l <at> lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l
RSS Feed