5. Workload

The Troy Workload...

class troy.workload.ComputeUnitDescription(descr={})

Bases: troy.utils.properties.Properties

The ComputeUnitDescription class is a simple container for properties which describe a ComputeUnit, i.e. a workload element. ComputeUnitDescription`s are submitted to :class:`WorkloadManager instances on add_task, and are internally used to create ComputeUnit instances.

FIXME: description of supported properties goes here

class troy.workload.ComputeUnit(param=None, _native_id=None, _task=None, _pilot_id=None)

Bases: troy.utils.properties.Properties

The ComputeUnit class represents the smallest element of work to be performed on behalf of an application, and is part of a workload managed by Troy. More specifically, `Task`s are decomposed into `ComputeUnit`s

ComputeUnits are created according to a ComputeUnitDescription, i.e. a set of key-value pairs describing the represented workload element.

cancel()

cancel the CU

class troy.workload.TaskDescription(descr={})

Bases: troy.utils.properties.Properties

The TaskDescription class is a simple container for properties which describe a Task, i.e. a workload element. TaskDescription`s are submitted to :class:`WorkloadManager instances on add_task, and are internally used to create Task instances.

FIXME: description of supported properties goes here

class troy.workload.Task(descr, _manager=None)

Bases: troy.utils.properties.Properties

The Task class represents a element of work to be performed on behalf of an application, and is part of a workload managed by Troy.

Task instances are created and owned by the Workload class they are part of – only that class should change its composition and state. Tasks are created according to a TaskDescription, i.e. a set of key-value pairs describing the represented workload element.

As tasks are components of a Workload, they are subject to the transformations the workload undergoes (see Workload documentation for details). During that process, tasks are enriched with additional information, which are kept as additional member properties:

FIXME: do we need states for tasks? Like: DESCRIBED, TRANSLATED, BOUND, DISPATCHED, DONE? Sounds useful on a first glance, but on re-bind etc (see comments in workload manager), the states quickly become meaningless... But related to that will be the workload state inspection from the upper layers, and from the overlay manager (which better not plan pilots for completed tasks :P).

cancel()

cancel all units

get_state()

The task state is a wonderous thing – it is sometimes atomic, and sometimes it isn’t... It is derived as follows:

The initial stages of Troy cause atomic state transitions for the tasks – they are created as DESCRIBED, workload_manager.translate_workload() moves them to TRANSLATED, workload_manager.bind_workload() moves them to BOUND, and workload_manager.dispatch_workload() moves them to DISPATCHED.

Up to then, all state transitions are under full control of Troy, so we can make sure that the task states make sense – if any of the transitions cannot be performed for a task, we can raise an exception and not advance the state, or revert everything and move into FAILED state.

After dispatch, however, the units which make up the tasks have states which are managed by some backend, and have individual and uncorrelated state transitions. At that point, we make the task state dependent on the tasks states, and define:

     if any unit  is  FAILED     : task.state = FAILED
else if any unit  is  CANCELED   : task.state = CANCELED
else if any unit  is  DISPATCHED : task.state = DISPATCHED
else if any unit  is  DISPATCHED : task.state = RUNNING
else if any unit  is  RUNNING    : task.state = RUNNING
else if all units are DONE       : task.state = DONE
else                             : task.state = UNKNOWN
class troy.workload.Workload

Bases: troy.utils.properties.Properties

The Workload class represents a workload which is managed by Troy. It contains a set of Tasks, and a set of :class:`Relation`s between those tasks.

Workload instances are owned by the WorkloadManager class – only that class should change its composition and state.

A workload undergoes a series of transformations before ending up as on a specific resource (pilot). Those transformations are orchestrated by the workload manager. To support that orchestration, a workload will be lockable, and it will have a state attribute. The valid states are listed below.

Internally, a workload is represented in two parts: a dictionary of tasks (Task instances mapped to their task id), and a list of Relation instances. As the workload undergoes transformations, it is enriched by additional information, although those are kept solely within the Task instances – see there for more details.

The workload transformations are:

  • Planning: A workload is inspected and its cardinal parameters are expanded, based on the overlay if it exists.
  • Translation: A workload is inspected, and its tasks are translated into compute units. A single task may result in one or more compute units. Multiple tasks may be combined into one compute unit.
  • Scheduling: A translated workload is mapped onto an resource overlay. More specifically, the compute units of an translated workload are scheduled on the compute pilots of a given overlay.
  • Dispatching: A scheduled workload is dispatched to the active entities (pilots) of an overlay.

A workload can be in different states, depending on the transformations performed on it. Specifically, it can be in DESCRIBED, PLANNED, TRANSLATED, SCHEDULED, DISPATCHED, DONE or FAILED. A workload enters the workload manager in DESCRIBED or PLANNED state, and all follow-up state transitions are kept within the workload manager.

Those states are ill defined in case of partial transformations – if, for example, a translation step only derives compute units for some of the tasks, but not for others. As a general rule, a workload will remain in a state until the transformation has been performed on all applicable workload components (tasks and relations).

Even on fully transformed workloads, the actual workload state may not be trivial to determine – for example, a specific compute unit configuration derived in a translation step may show to be impossible to dispatch later on, and may require a re-translation into a different configuration; or if a newly DESCRIBED task is added to a SCHEDULED workload. Those feedback loops are considered out-of-scope for Troy at this point, so that state transitions are considered irreversible.

cancel()

cancel all tasks

add_task(descr)

Add a task (or a list of tasks) to the workload.

Tasks are expected of type TaskDescription.

add_relation(descr)

Add a relation for a pair of tasks, or a list relations for a set task-pairs, to the workload.

Relations are expected of type Relation, and can only be added once. The related tasks must already be in the Workload – otherwise a ValueError is raised.

get_state()

The workload state is a wonderous thing – it is sometimes atomic, and sometimes it isn’t... It is derived as follows:

The initial stages of Troy cause atomic state transitions for the workload – it is created as DESCRIBED, planner.plan() moves it to PLANNED, workload_manager.translate_workload() moves it to TRANSLATED, workload_manager.bind_workload() moves it to BOUND, and workload_manager.dispatch_workload() moves it to DISPATCHED.

Up to then, all state transitions are under full control of Troy, so we can make sure that the global workload state makes sense – if any of the transitions cannot be performed for a task, we can raise an exception and not advance the state, or revert everything and move into FAILED state.

After dispatch, however, the tasks (and more precisely the units which make up the tasks) have a state which is managed by some backend, and have individual and uncorrelated state transitions. At that point, we make the workload state dependent on the tasks states, and define:

     if any task  is  FAILED     :  workload.state = FAILED
else if any task  is  CANCELED   :  workload.state = CANCELED
else if any task  is  DISPATCHED :  workload.state = DISPATCHED
else if all tasks are DONE       :  workload.state = DONE
else                             :  workload.state = UNKNOWN
lock()
locked()
unlock(*args)

Previous topic

4. Planner

Next topic

6. Overlay

This Page