Michael Lawrence | 14 Nov 15:40 2012

pooling for parallel hierarchical operations

We often execute nested operations in parallel. For example, first by
sample, then by chromosome. Fixed allocation of resources to each level
will often result in waste. For example, if one sample finishes quickly,
its CPUs are not available to help the other samples along. Perhaps the
most expedient solution is to expand.grid() the hierarchy and create one
job for every combination, i.e., flatten the hierarchy. A more ideal
solution might be a pool of resources (cores) that are allocated more
fluidly. Is there any sort of pooling system for R? I know that the
parallel package supports the declaration of resources in cluster objects,
but there is no central manager. This is a general R question, but it's
worth discussing in the context of how we can make better use of
parallelism in the low-level infrastructure, which would cause these
hierarchies to arise. It's also relevant to the discussion of specifying
parallelization modes or strategies. Pools themselves could be hierarchical
and heterogeneous (hosts, cores). Declaring available resources is fairly
straight-forward. Deciding how to use them is context dependent and
requires user control.


	[[alternative HTML version deleted]]

Bioconductor mailing list
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor