14 Nov 2012 15:40
pooling for parallel hierarchical operations
We often execute nested operations in parallel. For example, first by sample, then by chromosome. Fixed allocation of resources to each level will often result in waste. For example, if one sample finishes quickly, its CPUs are not available to help the other samples along. Perhaps the most expedient solution is to expand.grid() the hierarchy and create one job for every combination, i.e., flatten the hierarchy. A more ideal solution might be a pool of resources (cores) that are allocated more fluidly. Is there any sort of pooling system for R? I know that the parallel package supports the declaration of resources in cluster objects, but there is no central manager. This is a general R question, but it's worth discussing in the context of how we can make better use of parallelism in the low-level infrastructure, which would cause these hierarchies to arise. It's also relevant to the discussion of specifying parallelization modes or strategies. Pools themselves could be hierarchical and heterogeneous (hosts, cores). Declaring available resources is fairly straight-forward. Deciding how to use them is context dependent and requires user control. Michael [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed