****** FAQ ****** ^^^^^^^^^^^^^^^^^ General ^^^^^^^^^^^^^^^^^ ========================================================= Q. *Ruffus* won't create dependency graphs ========================================================= A. You need to have installed ``dot`` from `Graphviz `_ to produce pretty flowcharts likes this: .. image:: images/four_stage_pipeline.jpg (Flow Chart Key): .. image:: images/key.jpg ========================================================= Q. Some jobs re-run even when they seem up-to-date ========================================================= A. You might have fallen foul of coarse timestamp precision in some operating systems. If you are using ``@files`` or ``@files_re``, *ruffus* uses file modification times to see if input files were created before output files. Unfortunately, some file systems in some versions of Windows, Unix, linux or NFS do not record file times with sub-second precision. In the worse case, you might try adding some ``time.sleep(1)`` judiciously. ========================================================= Q. *Ruffus* seems to be hanging in the same place ========================================================= A. If *ruffus* is interrupted, for example, by a Ctrl-C, you will often find the following lines of code highlighted:: File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 1904, in pipeline_run File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 1380, in run_all_jobs_in_task File "/xxxx/python2.6/multiprocessing/pool.py", line 507, in next self._cond.wait(timeout) File "/xxxxx/python2.6/threading.py", line 237, in wait waiter.acquire() This is *not* where *ruffus* is hanging but the boundary between the main programme process and the sub-processes which run *ruffus* jobs in parallel. This is naturally where broken execution threads get washed up onto. ^^^^^^^^^^^^^^^ Windows ^^^^^^^^^^^^^^^ ========================================================= Q. Windows seems to spawn *ruffus* processes recursively ========================================================= A. It is necessary to protect the "entry point" of the program under windows. Otherwise, a new process will be started each time the main module is imported by a new Python interpreter as an unintended side effects. Causing a cascade of new processes. See: http://docs.python.org/library/multiprocessing.html#multiprocessing-programming This code works:: if __name__ == '__main__': try: pipeline_run([parallel_task], multiprocess = 5) except Exception, e: print e.args ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `Sun Grid Engine `_ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ========================================================= Q. *qrsh* eats up all my processor time under *ruffus* ========================================================= A. `Sun Grid Engine `_ provides the `qrsh `_ command to run an interactive rsh session. ``qrsh`` can be used to run commands/scripts in a compute farm or grid cluster. However, when run within *ruffus*, ``qrsh`` seems to spin idly, polling for input, consuming all the CPU resources in that process. An interim solution is to close the ``STDIN`` for the ``qrsh`` invocation:: from subprocess import Popen, PIPE qrsh_cmd = ["qrsh", "-now", "n", "-cwd", "-p", "-%d" % priority, "-q", queue_name, "little_script.py"] p = Popen(qrsh_cmd, stdin = PIPE) p.stdin.close() sts = os.waitpid(p.pid, 0) ===================================================================== Q. When I submit lots of jobs at the same time, SGE freezes and dies ===================================================================== A. This seems to be dependent on your setup. One workaround may be to introduce a random time delay at the beginining of your jobs:: import time, random @parallel(param_func) def task_in_parallel(input_file, output_file): """ Works starts after a random delay so that SGE has a chance to manage the queue """ time.sleep(random.random() / 2.0) # Wake up and do work