Features Download
From: Fields, Christopher J <cjfields <at> illinois.edu>
Subject: Re: standalone blast outformat
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Wednesday 30th October 2013 15:13:59 UTC (over 3 years ago)
On Oct 30, 2013, at 4:32 AM, 

> Hi Florent and Jason,
> Thank you for your replies.
> well i spent quite some time to figure that out. But still no results.
> Yes i use BLAST+. So I tried BLASTN as it is, without bioperl. Just to
see if i can do: blastn -outfmt "7 qcovs other_things". Well it works like
this. But in bioperl can not pass do that. I dont know why tho.

You are correct in your assumption, that you have to specify the following:
    -outfmt "7 qcovs …”

I’ve run into this myself.  Based on the error you are getting when
trying this option, I’m pretty sure the ‘custom’ tab-delimited option
isn't available in the BLAST+ wrapper (e.g. it’s a bug).  

> After i tried to understand how tile_hsps() works. But no results.
Somehow the documentation on that matter is not clear and not a simple
example is present. So i could not figure out how to use it.

The tiling code in bioperl is in a much better state thanks to Mark Jensen.
 The hit-level methods do some of the work for you; if you use any of the
frac_* (frac_identical, frac_conserved, frac_aligned_query) methods in
GenericHit you are using it w/o knowing.  The last one seems like it
*might* be what you want:

 Usage     : $hit_object->frac_aligned_query();
 Purpose   : Get the fraction of the query sequence which has been aligned
           : across all HSPs (not including intervals between
           : HSPs).
 Example   : $frac_alnq = $hit_object->frac_aligned_query();
 Returns   : Float (2-decimal precision, e.g., 0.75).
 Argument  : n/a
 Throws    : n/a
 Comments  : If you need data for each HSP, use hsps() and then interate
           : through the HSP objects.
           : This method requires that all HSPs be tiled. If they have not
           : already been tiled, they will be tiled first automatically.

> Now i am trying to figure out how to use FASTA or SSEARCH to solve my
problem. I suppose i have to get the sequences of the HSPs representing my
hit and then run ssearch like this:
>   ssearch my_query_seq.fa my_hit_hsps.fa
> Then parse the output. Just have to figure out what exactly do i need
from the output :)

True.  Let us know if you run into problems with that.


> Thank you again
> D.
> Quoting Jason Stajich :
>> Assuming 1 HSP you can do:
>> my $qcov = $hsp->query->length / $query->length;
>> Otherwise you have to add up the HSPs and their coverage (and remove
redundancy of overlapping hits) to determine what parts of the query were
aligned across all the HSPs that were found. You can try the tile_hsps
function to also get this. Or if you are using WUBLAST you can get the
-links option to grab the logical groups of non-overlapping HSPs to tile
>> Or use FASTA or SSEARCH which will provide a single longest path through
the alignment.
>> On Oct 29, 2013, at 10:12 PM, Florent Angly 
>>> Hi Dimitar,
>>> I suppose you are using BLAST+, not legacy BLAST. There is a HOWTO
>>> here that can get you started:
>>> http://www.bioperl.org/wiki/HOWTO:BlastPlus
>>> Regarding your issue, I had a look at the blastn manual, and I doesn't
>>> look like you can write '-outfmt "7 qcovs ..."' to me. If anything, it
>>> would be '-outfmt "qcovs ..."'.
>>> So, maybe try to get your command to work as intended in a terminal
>>> before putting it in BioPerl.
>>> Best,
>>> Florent
>>> On Wed, Oct 30, 2013 at 1:45 PM,   wrote:
>>>> Hi guys,
>>>> i was wondering how can i get the query coverage when blasting with
>>>> So i figure out one way is to specify the outformat to tabular(7) and
>>>> specify that i want the qcovs(query coverage per subject). From BLAST
>>>> i see one can do: -outfmt "7 qcovs ..." but in Bioperl i can only say:
>>>> -outformat => 7. If i do: outformat => "7 qcovs" i get an error.
>>>> Well another option i suppose is to get the lengths of all hsps from a
>>>> and estimate the query coverage that way but im not exactly sure if
this is
>>>> correct way.
>>>> Any idea how can i get the query coverage?
>>>> Thank you very much for your time and any hints
>>>> Regards
>>>> Dimitar
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Jason Stajich
>> jason.stajich@gmail.com
>> jason@bioperl.org
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Chris Fields
Technical Lead in Genome Informatics
High Performance Computing in Biology
W.M. Keck Center for Comparative and Functional Genomics
Institute for Genomic Biology
1206 W. Gregory Dr., MC-195
Urbana, IL 61801
Phone: (217) 244-1890
CD: 2ms