Toil API

Job Methods

class toil.job.Job(memory=None, cores=None, disk=None)[source]

Represents a unit of work in toil. Jobs are composed into graphs which make up a workflow.

This public functions of this class and its nested classes are the API to toil.

addChild(childJob)[source]

Adds the child job to be run as child of this job. Returns childJob. Child jobs are run after the Job.run method has completed.

See Job.checkJobGraphAcylic for formal definition of allowed forms of job graph.

addChildFn(fn, *args, **kwargs)[source]

Adds a child fn. See FunctionWrappingJob. Returns the new child Job.

addChildJobFn(fn, *args, **kwargs)[source]

Adds a child job fn. See JobFunctionWrappingJob. Returns the new child Job.

addFollowOn(followOnJob)[source]

Adds a follow-on job, follow-on jobs will be run after the child jobs and their descendants have been run. Returns followOnJob.

See Job.checkJobGraphAcylic for formal definition of allowed forms of job graph.

addFollowOnFn(fn, *args, **kwargs)[source]

Adds a follow-on fn. See FunctionWrappingJob. Returns the new follow-on Job.

addFollowOnJobFn(fn, *args, **kwargs)[source]

Add a follow-on job fn. See JobFunctionWrappingJob. Returns the new follow-on Job.

addService(service)[source]

Add a service of type Job.Service. The Job.Service.start() method will be called after the run method has completed but before any successors are run. It’s Job.Service.stop() method will be called once the successors of the job have been run.

:rtype : An instance of PromisedJobReturnValue which will be replaced with the return value from the service.start() in any successor of the job.

checkJobGraphAcylic()[source]

Raises a JobGraphDeadlockException exception if the connected component of jobs containing this job contains any cycles of child/followOn dependencies in the augmented job graph (see below). Such cycles are not allowed in valid job graphs. This function is run during execution.

A job B that is on a directed path of child/followOn edges from a job A in the job graph is a descendant of A, similarly A is an ancestor of B.

A follow-on edge (A, B) between two jobs A and B is equivalent to adding a child edge to B from (1) A, (2) from each child of A, and (3) from the descendants of each child of A. We call such an edge an “implied” edge. The augmented job graph is a job graph including all the implied edges.

For a job (V, E) the algorithm is O(|V|^2). It is O(|V| + |E|) for a graph with no follow-ons. The former follow on case could be improved!

checkJobGraphConnected()[source]

Raises a JobGraphDeadlockException exception if getRootJobs() does not contain exactly one root job. As execution always starts from one root job, having multiple root jobs will cause a deadlock to occur.

checkJobGraphForDeadlocks()[source]

Raises a JobGraphDeadlockException exception if the job graph is cyclic or contains multiple roots.

encapsulate()[source]

See EncapsulatedJob.

:rtype : A new EncapsulatedJob for this job.

getRootJobs()[source]

A root is a job with no predecessors. :rtype : set, the roots of the connected component of jobs that contains this job.

getUserScript()[source]
run(fileStore)[source]

Do user stuff here, including creating any follow on jobs.

The fileStore argument is an instance of Job.FileStore, and can be used to create temporary files which can be shared between jobs.

The return values of the function can be passed to other jobs by means of the rv() function.

Note: We disallow return values to be PromisedJobReturnValue instances (generated by the Job.rv() function - see below). A check is made that will result in a runtime error if you attempt to do this. Allowing PromisedJobReturnValue instances to be returned does not work because the mechanism to pass the promise uses a jobStoreFileID that will be deleted once the current job and its descendants have been completed. This is similar to scope rules in a language like C, where returning a reference to memory allocated on the stack within a function will produce an undefined reference. Disallowing this also avoids nested promises (PromisedJobReturnValue instances that contain other PromisedJobReturnValue).

rv(argIndex=None)[source]

Gets a PromisedJobReturnValue, representing the argIndex return value of the run function (see run method for description). This PromisedJobReturnValue, if a class attribute of a Job instance, call it T, will be replaced by the actual return value when the T is loaded. The function rv therefore allows the output from one Job to be wired as input to another Job before either is actually run.

Parameters:argIndex – If None the complete return value will be returned, if argIndex

is an integer it is used to refer to the return value as indexable (tuple/list/dictionary, or in general object that implements __getitem__), hence rv(i) would refer to the ith (indexed from 0) member of return value.

static wrapFn(fn, *args, **kwargs)[source]

Makes a Job out of a function.

Convenience function for constructor of FunctionWrappingJob

static wrapJobFn(fn, *args, **kwargs)[source]

Makes a Job out of a job function.

Convenience function for constructor of JobFunctionWrappingJob

Job.FileStore

The FileStore is an abstraction of a Toil run’s shared storage

class Job.FileStore(jobStore, jobWrapper, localTempDir)[source]

Class used to manage temporary files and log messages, passed as argument to the Job.run method.

deleteGlobalFile(fileStoreID)[source]

Deletes a global file with the given fileStoreID. Returns true if file exists, else false.

getEmptyFileStoreID()[source]

Returns the ID of a new, empty file.

getLocalTempDir()[source]

Get the local temporary directory. This directory will exist for the duration of the job only, and is guaranteed to be deleted once the job terminates.

globalFileExists(fileStoreID)[source]

:rtype : True if and only if the jobStore contains the given fileStoreID, else false.

logToMaster(string)[source]

Send a logging message to the leader. Will only ne reported if logging is set to INFO level (or lower) in the leader.

readGlobalFile(fileStoreID, localFilePath=None)[source]

Returns a path to a local copy of the file keyed by fileStoreID. The version will be consistent with the last copy of the file written/updated to the global file store. If localFilePath is not None, the returned file path will be localFilePath.

readGlobalFileStream(fileStoreID)[source]

Similar to readGlobalFile, but returns a context manager yielding a file handle which can be read from. The yielded file handle does not need to and should not be closed explicitly.

updateGlobalFile(fileStoreID, localFileName)[source]

Replaces the existing version of a file in the global file store, keyed by the fileStoreID. Throws an exception if the file does not exist.

updateGlobalFileStream(fileStoreID)[source]

Similar to updateGlobalFile, but returns a context manager yielding a file handle which can be written to. The yielded file handle does not need to and should not be closed explicitly.

writeGlobalFile(localFileName)[source]

Takes a file (as a path) and uploads it to to the global file store, returns an ID that can be used to retrieve the file.

writeGlobalFileStream()[source]

Similar to writeGlobalFile, but returns a context manager yielding a tuple of 1) a file handle which can be written to and 2) the ID of the resulting file in the job store. The yielded file handle does not need to and should not be closed explicitly.

Job.Runner

The Runner contains the methods needed to configure and start a Toil run.

class Job.Runner[source]

Used to setup and run a graph of jobs.

static addToilOptions(parser)[source]

Adds the default toil options to an optparse or argparse parser object.

static getDefaultOptions(jobStore)[source]

Returns an optparse.Values object of the options used by a toil.

static startToil(job, options)[source]

Runs the toil workflow using the given options (see Job.Runner.getDefaultOptions and Job.Runner.addToilOptions) starting with this job.

raises:toil.leader.FailedJobsException if at the end of function their remain

failed jobs