Florent Angly | 4 Dec 22:52 2012

Re: Problem with BIO::DB::FASTA and Colon in Fasta Header

Hi Jason,

See the documentation for seq() at 

When you call seq() with a single argument, e.g. 
$db->seq('C7047455:0-100'), Bio::DB::Fasta interprets it as a compound 
ID and looks for position 0 to 100 of a sequence called C7047455. This 
is a feature that has been in Bio::DB::Fasta since the dawn of time. In 
this form, seq() expects a colon as part of the compound ID, which is 
problematic because your sequence ID actually contains a colon.

I think that when you call $db->seq($id,$start,$end), Bio::DB::Fasta 
does not attempt to parse your ID. This is why your code works with this 
form. Note that if you want to get the entirety of a sequence called 
'C7047455:0-100', the easiest if your sequence names contain colon is to 
use $db->get_Seq_by_id('C7047455:0-100') since get_Seq_by_id() does only 
take a regular ID (not compound).


On 05/12/12 06:23, Jason Gallant wrote:
> Hello,
> I'm trying to retreive fasta sequences that contain a colon in their
> header.  However, I cannot get my BioPerl script to do this!!
> It works as expected when the header does not contain the colon, however
> doesn't return anything when it does.  Weirdly, when I ask it to return the
> parsed IDs (see below), it returns the appropriate IDs, which include the
> colon!  Very confusing, would appreciate any help!!
> Many Thanks,
> Jason Gallant
> use strict;
> use Bio::SearchIO;
> use Bio::DB::Fasta;
> my ($file,$id,$start,$end) =
> ("secondround_merged_expanded.fasta","C7047455:0-100",1,10);
> my $db = Bio::DB::Fasta->new($file, -reindex=>1);
> my $seq = $db->seq($id,$start,$end);
> print $db->ids;
> print $seq,"\n";
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l