Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Fields, Christopher J <cjfields <at> illinois.edu>
Subject: Re: SimpleAlign Query
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Thursday 20th June 2013 17:10:17 UTC (over 3 years ago)
Agreed, start/end with genome coordinates has start always less than end;
the strand dictates direction for the feature of interest.  You must use
this in combination with start/end to get everything you need (this is
*NOT* unique to BioPerl, most other bioinformatics software follows the
same general dictum).

That was an informative and somewhat highly confusing exchange (doesn't
help to have two people with very close names involved :)

chris (fields with an 's')

On Jun 20, 2013, at 11:24 AM, Jun Yin <[email protected]> wrote:

> Hi, Chris Field,
> 
> I have tried your script. It worked well in my computer. There should not
> be any problem with it.
> 
> For your question, in BioPerl, the sequence object always has the start
> coordinate smaller than the end coordinate. It is not a bug.
> 
> The strand information is kept in the sequence object. You can retrieve
> the strand information by $seq->strand .
> 
> As suggested by Chris J Fields, It is more common to use SearchIO to deal
> with blast output. You can try that module too.
> 
> Cheers,
> -- 
> Jun Yin, Ph.D.
> Postdoc Associate
> 
> James Noonan Group
> Department of Genetics
> Yale University School of Medicine
> Sterling Hall of Medicine, I-120A
> 333 Cedar Street 
> New Haven, CT 06510
> http://www.yale.edu/noonanlab/Jun.html
> 
> 
> 
> 
> 
> On 6/19/13 9:15 AM, "Fields, Christopher J" 
wrote:
> 
>> The alignments from BLAST reports are generally parsed via
Bio::SearchIO.
>> There is a HOWTO on SearchIO:
>> 
>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>> 
>> Tiling in BioPerl uses these objects, but you can get Bio::SimpleAlign
>> from the HSPs.
>> 
>> chris
>> 
>> On Jun 18, 2013, at 10:49 AM, "Yin, Jun" <[email protected]>
>> wrote:
>> 
>>> Hi, Chris,
>>> 
>>> No problem. I can help to look at what is the problem. Yes, usually in
>>> Bio::SimpleAlign even if the sequence is reverse complementary, the
>>> start
>>> coordinate should be smaller than the end coordinate. But it seems
>>> Bio::AlignIO::bl2seq is a wrapper for Bio::SearchIO::blast. It is a bit
>>> different from the other Bio::AlignIO modules, so there may be some
>>> inconsistency.
>>> 
>>> Can you send me the input file for your script? So I can try to figure
>>> out
>>> where the problem is.
>>> 
>>> I would like to recommend you to ask in the bioperl mail list. There
>>> will
>>> be more experts in BioPerl there.
>>> 
>>> 
>>> Cheers,
>>> -- 
>>> Jun Yin, Ph.D.
>>> Postdoc Associate
>>> 
>>> James Noonan Group
>>> Department of Genetics
>>> Yale University School of Medicine
>>> Sterling Hall of Medicine, I-120A
>>> 333 Cedar Street
>>> New Haven, CT 06510
>>> http://www.yale.edu/noonanlab/Jun.html
>>> 
>>> 
>>> 
>>> 
>>> On 6/18/13 11:23 AM, "Christopher Field"  wrote:
>>> 
>>>> Dear Dr. Yin,
>>>> 
>>>> I've got a problem with SimpleAlign.pm, and since you are responsible
>>>> for
>>>> the most recent updates on this module, I hope it's ok to email and
ask
>>>> for your help.
>>>> To summarise my short program, it uses bl2seq to throw some sanger
>>>> reads
>>>> against a genome, and then I'm tiling the reported hits into a
visually
>>>> accessible format.
>>>> The problem has arisen, that sometimes the alignment is with the
>>>> reverse
>>>> complement, but the start and end of both sequences in the Align
object
>>>> are such that the start is always lower than the end. The strand flag
>>>> also appears uninitialised. Is this a straightforward bug, or a result
>>>> of
>>>> something I'm doing in my code?
>>>> Any help would be greatly appreciated.
>>>> 
>>>> Chris Field
>>>> Group van Nimwegen
>>>> Biozentrum der Universit├Ąt Basel
>>>> 4056-CH Basel
>>>> 
>>>> Code:
>>>> 
>>>> #!/usr/bin/env perl;
>>>> 
>>>> use v5.8.1;
>>>> use warnings;
>>>> use strict;
>>>> 
>>>> use Bio::Seq;
>>>> use Bio::SeqIO;
>>>> use Bio::AlignIO;
>>>> use Bio::SimpleAlign;
>>>> use Bio::Tools::Run::StandAloneBlast;
>>>> 
>>>> my $usage = "align.pl reference query\n";
>>>> 
>>>> open (my $fo,'>',"align.out",) or die $!;
>>>> 
>>>> my $rfile = shift or die $usage;
>>>> my $qfile = shift or die $usage;
>>>> 
>>>> my $rstream = Bio::SeqIO->new(-file => $rfile);
>>>> my $qstream = Bio::SeqIO->new(-file => $qfile);
>>>> 
>>>> my $rseq = $rstream->next_seq;
>>>> 
>>>> my $align = Bio::Tools::Run::StandAloneBlast->new(-program =>
'blastn',
>>>> -outfile => 'bl2seq.out');
>>>> 
>>>> while (my $qseq = $qstream->next_seq){
>>>>      $align->bl2seq($qseq,$rseq);
>>>>      my $astream = Bio::AlignIO->new(-file => 'bl2seq.out', -format =>
>>>> 'bl2seq');
>>>>      my @seq = ("n") x $qseq->length;
>>>>      my @match = (" ") x $qseq->length;
>>>>      my @pos = (" ") x $qseq->length;
>>>>      while (my $align = $astream->next_aln){
>>>>              my $qhit = $align->get_seq_by_pos(1);
>>>>              my $rhit = $align->get_seq_by_pos(2);
>>>>              if(!grep(/[acgt]/,@seq[$qhit->start-1 .. $qhit->end-1])){
>>>>                      print $rhit->start,":",$rhit->end,"\n";
>>>>                      @seq[$qhit->start-1 .. $qhit->end-1] =
>>>> split("",$rhit->seq);
>>>>                      @match[$qhit->start-1 .. $qhit->end-1] =
>>>> split("",$align->match_line);
>>>>                      @pos[$qhit->start-1 .. $qhit->end-1] =
>>>> ("<","-","-",split("",$rhit->start),("-") x
>>>> 
>>>>
($qhit->length-(length($rhit->start)+length($rhit->end)+6)),split("",$rh
>>>> it
>>>> ->end),"-","-",">");
>>>>              }
>>>>      }
>>>>      print $fo $qseq->display_id,"\n";
>>>>      print $fo $qseq->seq,"\n";
>>>>      print $fo @match,"\n";
>>>>      print $fo @seq,"\n";
>>>>      print $fo @pos,"\n\n";
>>>> }
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [email protected]
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
>
 
CD: 3ms