Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Paul Cantalupo <pcantalupo <at> gmail.com>
Subject: Re: Fix for Bug #3376 broke somewhere else
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Saturday 2nd March 2013 17:28:15 UTC (over 3 years ago)
Hi Francisco,

Nice catch. Please submit a new bug report for this and reference bug
3376. Please provide a minimal hmmer output file, a script and the
expected output. Then, I'll look into it and fix the bug.

Thank you,

Paul

Paul Cantalupo
University of Pittsburgh


On Thu, Feb 28, 2013 at 10:36 AM, Francisco J. Ossandón
 wrote:
> Hi,
> I was re-checking Bug #3302 using the Bio::SearchIO modules of the
> repository and found that now it can't parse a Hmmer2 file that was
> previously fine. After tracking the problem, I discovered that a change
in a
> regular expression to fix another bug broke the parse.
>
> The fix for the Bug #3376 consisted in adding an extra condition to omit
> lines where end of domain indicator is split across lines
> (https://redmine.open-bio.org/issues/3376):
> TEST: domain 1 of 1, from 8 to 97: score 184.7, E = 2.5e-56
>                    *->svfqqqqssksttgstvtAiAiAigYRYRYRAvtWnsGsLssGvnDn
>                       sv+qqqq+  +    +vtAiAiAigYRYRYRAv Wn GsLs G nDn
>         Test     8    SVYQQQQGGSA----MVTAIAIAIGYRYRYRAVVWNKGSLSTGTNDN 50
>
>                    DnDqqsdgLYtiYYsvtvpssslpsqtviHHHaHkasstkiiikiePr<-
>                    DnDq +d LYtiYYsvtv +ss+p q+v+HHHaH+asstkiiiki P
>         Test    51 DNDQAAD-LYTIYYSVTVSASSWPGQSVTHHHAHPASSTKIIIKIAPS   97
>
>                    *
>
>         Test     -   -
> This case is characterized by the 2 dashes in the line...
>
> So the expression added in hmmer2.pm - ‘next_result’
> (https://github.com/bioperl/bioperl-live/commit/142e5d79e3a6593db32bf0af9904
> 8f47d01bd3f2):
>                         elsif (CORE::length($_) == 0
>                             || ( $count != 1 && /^\s+$/o )
>                             || /^\s+\-?\*\s*$/
>                             || /^.+\-\s+\-\s*$/ ) ### <--- This regex was
> designed for bug 3376
>                         {
>                             next;
>                         }
>
> But the expression used is too broad because it uses the "^.+" just
before
> the 2 dashes, and it broke these lines parsing, where is full of dashes:
>                    KyACrqCdtiVQAPaPakpIErGiptaGLLArvlVSKyaEHlPLYRQsEI
>
>   lcl|gi|340     - -------------------------------------------------- -
>
>                    yaRqGVeiaRstLadWVgrtgarLaPLvdALaeyVLkeGklHADeTPVqV
>                          +i  s L   V++ + r
>   lcl|gi|340 60938 ------AIMISGLIHGVSARCLRF--------------------------
60955
>
> I think a reasonable fix that still fixes the original bug and restore
the
> function for this case is to add an extra \s+ in the regex just before
the
> first dash, so the expression makes sure that the first dash is the one
that
> comes AFTER the description (and is replacing the usual coordinate
number)
> and is not the last of an alignment or a series of dashes like the one
> above:
>                         elsif (CORE::length($_) == 0
>                             || ( $count != 1 && /^\s+$/o )
>                             || /^\s+\-?\*\s*$/
>                             || /^.+\s+\-\s+\-\s*$/ ) ### <--- Tweaked
regex
>                         {
>                             next;
>                         }
> I tested it and it works fine, hope you find the fix acceptable.
>
> Cheers,
>
> --
> Francisco J. Ossandon
> Bioinformatician.
> Ph.D. Candidate, University Andres Bello.
> Center for Bioinformatics and Genome Biology,
> Fundacion Ciencia para la Vida.
> Santiago, Chile.
> www.cienciavida.cl/CBGB.htm
>
>
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
 
CD: 3ms