Fields, Christopher J | 30 Oct 16:13 2013

Re: standalone blast outformat

On Oct 30, 2013, at 4:32 AM, <dimitark <at> bii.a-star.edu.sg> <dimitark <at> bii.a-star.edu.sg> wrote:

> Hi Florent and Jason,
> Thank you for your replies.
> 
> well i spent quite some time to figure that out. But still no results.
> Yes i use BLAST+. So I tried BLASTN as it is, without bioperl. Just to see if i can do: blastn -outfmt "7 qcovs
other_things". Well it works like this. But in bioperl can not pass do that. I dont know why tho.

You are correct in your assumption, that you have to specify the following:

    -outfmt "7 qcovs …”

I’ve run into this myself.  Based on the error you are getting when trying this option, I’m pretty sure
the ‘custom’ tab-delimited option isn't available in the BLAST+ wrapper (e.g. it’s a bug).  

> After i tried to understand how tile_hsps() works. But no results. Somehow the documentation on that
matter is not clear and not a simple example is present. So i could not figure out how to use it.

The tiling code in bioperl is in a much better state thanks to Mark Jensen.  The hit-level methods do some of
the work for you; if you use any of the frac_* (frac_identical, frac_conserved, frac_aligned_query)
methods in GenericHit you are using it w/o knowing.  The last one seems like it *might* be what you want:

 Usage     : $hit_object->frac_aligned_query();
 Purpose   : Get the fraction of the query sequence which has been aligned
           : across all HSPs (not including intervals between non-overlapping
           : HSPs).
 Example   : $frac_alnq = $hit_object->frac_aligned_query();
 Returns   : Float (2-decimal precision, e.g., 0.75).
 Argument  : n/a
 Throws    : n/a
 Comments  : If you need data for each HSP, use hsps() and then interate
           : through the HSP objects.
           : This method requires that all HSPs be tiled. If they have not
           : already been tiled, they will be tiled first automatically.

> Now i am trying to figure out how to use FASTA or SSEARCH to solve my problem. I suppose i have to get the
sequences of the HSPs representing my hit and then run ssearch like this:
> 
>   ssearch my_query_seq.fa my_hit_hsps.fa
> 
> Then parse the output. Just have to figure out what exactly do i need from the output :)

True.  Let us know if you run into problems with that.

chris

> Thank you again
> D.
> 
> Quoting Jason Stajich <jason.stajich <at> gmail.com>:
> 
>> Assuming 1 HSP you can do:
>> 
>> my $qcov = $hsp->query->length / $query->length;
>> 
>> Otherwise you have to add up the HSPs and their coverage (and remove redundancy of overlapping hits) to
determine what parts of the query were aligned across all the HSPs that were found. You can try the
tile_hsps function to also get this. Or if you are using WUBLAST you can get the -links option to grab the
logical groups of non-overlapping HSPs to tile across.
>> 
>> Or use FASTA or SSEARCH which will provide a single longest path through the alignment.
>> 
>> 
>> On Oct 29, 2013, at 10:12 PM, Florent Angly <florent.angly <at> gmail.com> wrote:
>> 
>>> Hi Dimitar,
>>> 
>>> I suppose you are using BLAST+, not legacy BLAST. There is a HOWTO
>>> here that can get you started:
>>> http://www.bioperl.org/wiki/HOWTO:BlastPlus
>>> Regarding your issue, I had a look at the blastn manual, and I doesn't
>>> look like you can write '-outfmt "7 qcovs ..."' to me. If anything, it
>>> would be '-outfmt "qcovs ..."'.
>>> So, maybe try to get your command to work as intended in a terminal
>>> before putting it in BioPerl.
>>> Best,
>>> 
>>> Florent
>>> 
>>> 
>>> On Wed, Oct 30, 2013 at 1:45 PM,  <dimitark <at> bii.a-star.edu.sg> wrote:
>>>> Hi guys,
>>>> i was wondering how can i get the query coverage when blasting with Bioperl.
>>>> So i figure out one way is to specify the outformat to tabular(7) and the
>>>> specify that i want the qcovs(query coverage per subject). From BLAST manual
>>>> i see one can do: -outfmt "7 qcovs ..." but in Bioperl i can only say:
>>>> -outformat => 7. If i do: outformat => "7 qcovs" i get an error.
>>>> 
>>>> Well another option i suppose is to get the lengths of all hsps from a hit
>>>> and estimate the query coverage that way but im not exactly sure if this is
>>>> correct way.
>>>> 
>>>> Any idea how can i get the query coverage?
>>>> 
>>>> Thank you very much for your time and any hints
>>>> 
>>>> Regards
>>>> Dimitar
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l <at> lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l <at> lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Jason Stajich
>> jason.stajich <at> gmail.com
>> jason <at> bioperl.org
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Chris Fields
Technical Lead in Genome Informatics
High Performance Computing in Biology
W.M. Keck Center for Comparative and Functional Genomics
Institute for Genomic Biology
1206 W. Gregory Dr., MC-195
Urbana, IL 61801
Phone: (217) 244-1890
http://www.hpcbio.illinois.edu

Gmane