8 Feb 20:11
How is a TaskClient "fault tolerant"? And can it play nice with PBS queueing?
Jon Olav Vik <jonovik <at> gmail.com>
2010-02-08 19:11:50 GMT
2010-02-08 19:11:50 GMT
I'm acquainting myself with parallel IPython and have a couple of questions. 1. Could someone please explain what it means that a TaskClient is "fault tolerant"? http://ipython.scipy.org/doc/stable/html/parallel/parallel_task.html 2. The task interface sounds useful for embarrassingly parallel computations. I'm trying to follow the instructions at http://ipython.scipy.org/doc/stable/html/parallel/parallel_process.html#using- ipcluster-in-pbs-mode (PBS is the queueing system used by the computer cluster I'm working with). I use the command ipcluster pbs -n 8 --pbs-script=pbs.template & to run the following pbs script: #PBS -N ipython #PBS -j oe #PBS -l walltime=00:10:00 #PBS -l nodes=${n/8}:ppn=8 #PBS -q express cd $$PBS_O_WORKDIR mpiexec -n ${n} ipengine --logfile=$$PBS_O_WORKDIR/ipengine & sleep 30 python ipar.py ...where ipar.py starts a MultiEngineClient and execute()'s commands that use MPI on the ipengines. (I haven't tried using it with a TaskClient yet.) Note that I'm starting mpiexec in the background; otherwise, it would never finish and my Python script would never get called. Also, I'm backgrounding the call to ipcluster because that too never seems to finish. (Using mpiexec with "python ipar.py" does not seem to be required.) However, the compute cluster's user instructions say I shouldn't start processes in the background, because then they escape the control of the job scheduler. Is there a way I can make TaskClient() work under this restriction? Otherwise, I'm just going to manually "killall ipcluster" etc. once my job is done. (Or maybe that could go as the last lines of my pbs script?) I'm a complete newbie in this, so any hints are highly appreciated. Best regards, Jon Olav Vik
RSS Feed