..
   Copyright 2015 Novartis Institutes for Biomedical Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

.. _cookbook-label:

********
Cookbook
********

Environment
===========

Checking that 3rd-party executables are present
-----------------------------------------------

When writing a recipe, checking that 3rd-party executables are present can be
highly desirable in order to let a script fail early and provide informative message
to a user about what is required but missing.

.. code-block:: python

   from railroadtracks.environment import Executable, MissingSoftware

   if not Executable.ispresent('bwa'):
       raise MissingSoftware('The bwa is not in the PATH.')


Project
=======

Get all tasks of a given type
-----------------------------

The types, or classes, defined in a model can be used to query a project.

.. code-block:: python

   taskset = project.get_tasksoftype(rnaseq.StarAlign)


Get all tasks matching a list of types
--------------------------------------

.. code-block:: python

   listoftypes = (rnaseq.StarAlign, rnaseq.BWA)
   taskset = reduce(lambda x,y: x.union(project.get_tasksoftype(y)),
                    listoftypes, easy.TaskSet())


Get all tasks fulfilling a given activity
-----------------------------------------

.. code-block:: python

   taskset = project.get_taskswithactivity(rnaseq.ACTIVITY.ALIGN)


TaskSet
=======

:class:`TaskSet` objects provide a convenient abstraction to manipulate groups of tasks by thinking about them
as :class:`set` objects. In addition to :class:`set` methods, they also have :mod:`railroadtracks` specific features.

Subset all tasks with a given status
------------------------------------

Retrieving all tasks with a given status from a :class:`TaskSet` can be achieved with the method
:meth:`TaskSet.filter_on_status`.

For example:

.. code-block:: python

   # all tasks in the set that failed
   taskset_failed = taskset.filter_on_status(hortator._TASK_FAILED))

   # all tasks in the set that succeeded
   taskset_success = taskset.filter_on_status(hortator._TASK_TODO))


Subset all tasks ready for execution
------------------------------------

Tasks ready for execution have all their /parent/ tasks with a status `_TASK_DONE` and have themselves
the status `_TASK_TODO` or `_TASK_FAILED`.

This is implemented very simply, using :meth:`TaskSet.filter_on_status`, :meth:`TaskSet.union`,
and :meth:`TaskSet.filter_on_parent_status`.

.. code-block:: python

   def exec_taskfilter(taskset):
       # union of tasks that are either "TO-DO" or "FAILED"
       tasks_todo = taskset.filter_on_status(hortator._TASK_TODO)
       tasks_todo = tasks_todo.union(taskset.filter_on_status(hortator._TASK_FAILED))
       # only keep the tasks for which the parent task was successfully run
       tasks_todo = tasks_todo.filter_on_parent_status(hortator._TASK_DONE)
       return tasks_todo


That function is part of the code base, so in practice one will only have to write:

.. code-block:: python

   import railroadtracks.easy.execution
   ts_torun = railroadtracks.easy.execution.exec_taskfilter(taskset)


Get all child tasks for a :class:`TaskSet`
------------------------------------------

Each :class:`TaskSet` can be manipulated as a :class:`set`, so if we think of the problem
as the union of the child taks for each task in the set it writes simply as:

.. code-block:: python

   ts_allchilds = easy.TaskSet()
   for task in taskset:
       ts_allchilds.union(task.child_tasks())

The one-line version is:

.. code-block:: python

   ts_allchilds = reduce(lambda x,y: x.union(y),
                         map(lambda x: x.child_tasks(), taskset),
                         easy.TaskSet())


TaskSetGraph
============

:class:`TaskSet` objects are rarely only considered in isolation since they are part of a larger
dependency graph. :class:`TaskSetGraph` helps handling all the tasks by allowing one to manipulate
groups of tasks not connected directly as :class:`TaskSet` objects and in the context
of the task-level dependency graph. Such sets can be added directly to a :class:`TaskSetGraph`
while dependencies between the task sets are inferred from the task-level relationship.

.. code-block:: python

   from railroadtracks.easy.tasks import TaskSet
   from railroadtracks.easy.tasksetgraph import TaskSetGraph

   tsg = TaskSetGraph()

   # add a taskset
   tsg.add(taskset_a)

   # add an other taskset
   tsg.add(taskset_b)


Execute all tasks
-----------------

In order to execute tasks, one can decide on a default task mapper (task mappers can also be set at the
:class:`TaskSet` level) and on a task filter (a function that decides which tasks should be executed).
The order in which :class:`TaskSet` object should be executed is inferred from the connections between
the :class:`Tasks` in the :class:`TaskSet` and does not require a user's intervention. The task filter
provides an additional level of refinement by letting a function control the tasks in the set that are
executed.

In the example below, we are using :class:`easy.execution.IterativeExecution` to map the tasks, and
the filter introduced earlier in the section `TaskSet`: only consider for execution the tasks that are
not yet done, and for which the /parent/ tasks were successfully completed (no point trying to
execute a task if its parent has not been succesful).


.. code-block:: python

   from railroadtracks.easy.processing import IterativeExecution   
   # task mapper to run the tasks
   p = IterativeExecution()
   tsg.defaultmapper = p

   # task filter
   tsg.defaultfilter = easy.execute.exec_taskfilter

   # execute all tasks that pass the filter
   tsg.execute()