The zc.async package provides a way to schedule jobs to be performed out-of-band from your current thread. The job might be done in another thread or another process, possibly on another machine. Here are some example core use cases.
Many of these core use cases involve end-users being able to start potentially expensive processes, on demand. Basic scheduled tasks are also provided by this package, though recurrence must be something you arrange.
This is a second-generation design. The first generation was zasync, a mission-critical and successful Zope 2 product in use for a number of high-volume Zope 2 installations. [1] It’s worthwhile noting that zc.async has absolutely no backwards compatibility with zasync and zc.async does not require Zope (although it can be used in conjunction with it).
Looking at the design from the perspective of regular usage, your code obtains a queue, which is a place to register jobs to be performed asynchronously.
Your application calls put on the queue to register a job. The job must be a pickleable, callable object. A global function, a callable persistent object, a method of a persistent object, or a special zc.async.job.Job object (discussed later) are all examples of suitable objects. The job by default is registered to be performed as soon as possible, but can be registered to be called at a certain time.
The put call will return a zc.async.job.Job object. This object represents both the callable and its deferred result. It has information about the job requested, the current state of the job, and the result of performing the job.
An example spelling for registering a job might be self.pending_result = queue.put(self.performSpider). The returned object can be stored and polled to see when the job is complete; or the job can be configured to do additional work when it completes (such as storing the result in a data structure).
Multiple processes, typically spread across multiple machines, can connect to the queue and claim and perform work. As with other collections of processes that share pickled objects, these processes generally should share the same software (though some variations on this constraint should be possible).
A process that should claim and perform work, in addition to a database connection and the necessary software, needs a dispatcher with a reactor to provide a heartbeat. The dispatcher will rely on one or more persistent agents in the queue (in the database) to determine which jobs it should perform.
A dispatcher is in charge of dispatching queued work for a given process to worker threads. It works with one or more queues and a single reactor. It has a universally unique identifier (UUID), which is usually an identifier of the application instance in which it is running. The dispatcher starts jobs in dedicated threads.
A reactor is something that can provide an eternal loop, or heartbeat, to power the dispatcher. It can be the main twisted reactor (in the main thread); another instance of a twisted reactor (in a child thread); or any object that implements a small subset of the twisted reactor interface (see discussion in dispatcher.txt, and example testing reactor in testing.py, used below).
An agent is a persistent object in a queue that is associated with a dispatcher and is responsible for picking jobs and keeping track of them. Zero or more agents within a queue can be associated with a dispatcher. Each agent for a given dispatcher in a given queue is identified uniquely with a name [2].
Generally, these work together as follows. The reactor calls the dispatcher. The dispatcher tries to find the mapping of queues in the database root under a key of zc.async (see constant zc.async.interfaces.KEY). If it finds the mapping, it iterates over the queues (the mapping’s values) and asks each queue for the agents associated with the dispatcher’s UUID. The dispatcher then is responsible for seeing what jobs its agents want to do from the queue, and providing threads and connections for the work to be done. The dispatcher then asks the reactor to call itself again in a few seconds.
Footnotes
[1] | The first generation, zasync, had the following goals:
It met its goals well in some areas and adequately in others. Based on experience with the first generation, this second generation identifies several areas of improvement from the first design, and adds several goals.
|
[2] | The combination of a queue name plus a dispatcher UUID plus an agent name uniquely identifies an agent. |