Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Jim Hu <jimhu <at> tamu.edu>
Subject: Re: Converting blast+ output to gff (with gaps)
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Friday 4th January 2013 21:57:38 UTC (over 3 years ago)
Malcolm,

Thanks, I should have reread the GFF3 spec before posting!

In the section on the Gap attrribute and below on alignments it discusses
two ways to represent an alignment. I was originally thinking of something
like the later example shown for cDNA vs genome. But the gap attribute
representation would be fine too. So, I can see how the final output could
be done in different ways, but I'm still stuck on how to get there.  

I don't have a specific application in mind; I'm mostly just trying to
understand how to get from having standalone blast+ output to get to things
that look like the examples in the gff spec and the gbrowse documentation -
really basic display of alignments that are gapped. For my teaching, we do
EST vs genomic blast and want gapped cDNA alignments to show where the
introns go. My other work is with bacteria where introns are rare, but
there are times when I'd like to show an alignment that is interrupted by a
transposable element, for example.

Excerpting from blastp -help

 *** Formatting options
 -outfmt 
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1,
    10 = Comma-separated values,
    11 = BLAST archive format (ASN.1) 

Several of these are "lossy" in terms of where the actual gaps occur (e.g.
6). Others seem to me to be more human readable than suited for parsing. So
I was hoping to get pointed to an existing script that would generate
either the single feature with gap attribute OR the multi-line match
features OR a combination from one of these output formats. 

I'm probably missing something very, very obvious.

Best,

Jim


On Jan 4, 2013, at 2:20 PM, Cook, Malcolm wrote:

> Jim,
> 
> Getting to your original question:
> 
>> I'm looking for a script that will take one of the blast+ outformats
that includes the positions of gaps and mismatches, and .create gff with
appropriate subfeatures.
> 
> Exactly what/how do you want/expect to encode the blast output as
GFF{1,2,2.5,3}??
> 
> If GFF3 pe http://www.sequenceontology.org/gff3.shtml
then are you hoping to get GFF3 marked up as described in section 'THE GAP
ATTRIBUTE' or as in 'ALIGNMENTS'
> 
> I would guess not because neither of them have 'subfeatures'.
> 
> If you could explain more fully with examples (hand cobbled or borrowed
from someone else) of what you expect then I might have a better idea of
what options might suit your needs.
> 
> 
> ~Malcolm
> 
> 
> .-----Original Message-----
> .From: [email protected]
[mailto:[email protected]] On Behalf Of Jim Hu
> .Sent: Friday, January 04, 2013 1:50 PM
> .To: Brian Osborne
> .Cc: Fields, Christopher J; Scott Cain; [email protected]
> .Subject: Re: [Bioperl-l] Converting blast+ output to gff (with gaps)
> .
> .Thanks for the replies, but...
> .
> .I can't tell what input formats for the blast results file are
supported.  Format 11 and format 6 give no output and no feedback. Putting
> .some diagnostic print statements in the code suggests that I'm not
getting any result objects from Bio::SearchIO.
> .
> .The script uses Bio::SearchIO, but does not seem to call the submodules
for blast.  Documentation links on the wiki seem to be
> .broken, at least on this page:
> .
> .	http://www.bioperl.org/wiki/Module:Bio::SearchIO
> .
> .Jim
> .
> .
> .On Jan 2, 2013, at 4:53 PM, Brian Osborne wrote:
> .
> .> Scott and Chris,
> .>
> .> I'll test it and see...
> .>
> .> Brian O.
> .>
> .>
> .> On Jan 2, 2013, at 5:26 PM, "Fields, Christopher J"
 wrote:
> .>
> .>> It should (I recall using it at one point).  If it doesn't we should
fix it so it does.
> .>>
> .>> How does MAKER deal with this?  IIRC it uses (a modified)
SearchIO-based method...
> .>>
> .>> chris
> .>>
> .>> On Jan 2, 2013, at 3:32 PM, Scott Cain  wrote:
> .>>
> .>>> Hi Brian,
> .>>>
> .>>> I was going to suggest the same thing--though that script is fairly
> .>>> old, it's not as old as the blast2gff script in the GBrowse
> .>>> distribution (which probably should be retired).  I believe it
> .>>> supports GFF3, though I don't have any sample data with which to
test
> .>>> it to be sure.  I also don't know if it supports BLAST+ input--I
> .>>> haven't kept up with SearchIO (on which search2gff.pl depends); will
> .>>> it accept it?
> .>>>
> .>>> Scott
> .>>>
> .>>>
> .>>> On Wed, Jan 2, 2013 at 3:26 PM, Brian Osborne
 wrote:
> .>>>> Here's one:
> .>>>>
> .>>>> https://github.com/GMOD/GBrowse/blob/master/contrib/blast2gff.pl
> .>>>>
> .>>>> Another one:
> .>>>>
> .>>>> ~/git/bioperl-live>head scripts/utilities/bp_search2gff.pl
> .>>>> #!perl
> .>>>>
> .>>>> # Author:      Jason Stajich 
> .>>>> # Description: Turn SearchIO parseable report(s) into a GFF report
> .>>>> #
> .>>>> =head1 NAME
> .>>>>
> .>>>> bp_search2gff - Turn SearchIO parseable reports(s) into a GFF
report
> .>>>>
> .>>>>
> .>>>>
> .>>>> Brian O.
> .>>>>
> .>>>> On Jan 2, 2013, at 2:44 PM, Jim Hu <[email protected]> wrote:
> .>>>>
> .>>>>> I assume this has already been done many times, but I can't seem
to find it on bioperl.org or via google.
> .>>>>>
> .>>>>> I'm looking for a script that will take one of the blast+
outformats that includes the positions of gaps and mismatches, and
> .create gff with appropriate subfeatures.
> .>>>>>
> .>>>>> Thanks,
> .>>>>>
> .>>>>> Jim
> .>>>>> =====================================
> .>>>>> Jim Hu
> .>>>>> Professor
> .>>>>> Dept. of Biochemistry and Biophysics
> .>>>>> 2128 TAMU
> .>>>>> Texas A&M Univ.
> .>>>>> College Station, TX 77843-2128
> .>>>>> 979-862-4054
> .>>>>>
> .>>>>>
> .>>>>>
> .>>>>> _______________________________________________
> .>>>>> Bioperl-l mailing list
> .>>>>> [email protected]
> .>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> .>>>>
> .>>>>
> .>>>> _______________________________________________
> .>>>> Bioperl-l mailing list
> .>>>> [email protected]
> .>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> .>>>
> .>>>
> .>>>
> .>>> --
> .>>>
------------------------------------------------------------------------
> .>>> Scott Cain, Ph. D.                                   scott at
scottcain dot net
> .>>> GMOD Coordinator (http://gmod.org/)                    
216-392-3087
> .>>> Ontario Institute for Cancer Research
> .>>> _______________________________________________
> .>>> Bioperl-l mailing list
> .>>> [email protected]
> .>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> .>>
> .>
> .
> .=====================================
> .Jim Hu
> .Professor
> .Dept. of Biochemistry and Biophysics
> .2128 TAMU
> .Texas A&M Univ.
> .College Station, TX 77843-2128
> .979-862-4054
> .
> .
> .
> ._______________________________________________
> .Bioperl-l mailing list
> .[email protected]
> .http://lists.open-bio.org/mailman/listinfo/bioperl-l

=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054
 
CD: 3ms