Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Siddhartha Basu <sidd.basu <at> gmail.com>
Subject: Re: FASTQ, was Re:BioPerl long-term, was Re: dependencies on perl version
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Thursday 7th February 2013 16:38:47 UTC (over 3 years ago)
Another approach might be use map-reduce(Hadoop) if possible. I have
seen one implementation in biopython's GFF3 parser.
http://bcbio.wordpress.com/2009/03/22/mapreduce-implementation-of-gff-parsing-for-biopython/

-siddhartha


On Thu, 07 Feb 2013, Aaron Mackey wrote:

> e.g., a pull-based FASTQ parser that did nothing else at the top level
but
> "chunk" the file into as-yet-unparsed four-line blobs could appear to
work
> very fast, if the user code did nothing but count the number of entries:
> 
>   while (my $seq = $seqio->nextseq) { $ct++ };
> 
> in other words, you defer *everything* except the minimal amount of
> parsing/logic required to detect object boundaries.
> 
> This is, in fact, the exact opposite of the event-based SearchIO "push"
> parsers, which always perform the most parsing possible, despite the user
> never accessing most of the material.
> 
> Lastly, with respect to performance, if the parsing/object building
> operation is not simply IO bound, then parallel parser/object-building
CPU
> threads could be considered, which could then dynamically adapt to
> pre-parse attributes (e.g. quality scores) that the calling code was
> actually using.  What's the state of thread-safe Perl these days?
> 
> -Aaron
> 
> 
> On Thu, Feb 7, 2013 at 10:56 AM, Fields, Christopher J <
> [email protected]> wrote:
> 
> > This will likely be the approach for more NGS-friendly Bio::Seq class.
> >  Calculation of the PHRED scores could also be deferred until needed.
> >
> > seqtk has some C-based methods that we could possibly take advantage
of,
> > but will have to look into it.
> >
> > chris
> >
> > On Feb 7, 2013, at 9:25 AM, Aaron Mackey  wrote:
> >
> > > You might also want to consider a lazy/pull-based parser to defer
> > parsing/object-building for pieces of the object that don't get used. 
This
> > also usually provides some error tolerance.
> > >
> > > -Aaron
> >
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 
CD: 3ms