Batch process scheduling and service monitoring on a single UNIX machines has historically been managed by cron and its derivatives. But if you have many batches, complex dependencies between batches, or many machines, maintaining config files across them may be difficult. Tron solves this problem by centralizing the configuration and scheduling of jobs to a single daemon.
The Tron system is split into four commands:
The config file uses YAML syntax, and is further described in Configuration.
trond is given access (via public key SSH) to one or more nodes on which to run jobs and services. For example, this configuration has two nodes, each of which is responsible for a single job:
nodes: hostname: 'localhost' - name: node1 hostname: 'batch1' - name: node2 hostname: 'batch2' jobs: - name: "job0" node: node1 schedule: "interval 20s" actions: - name: "batch1action" command: "sleep 3; echo asdfasdf" - name: "job1" node: node2 schedule: "interval 20s" actions: - name: "batch2action" command: "cat big.txt; sleep 10"
How the nodes are set up and assigned to jobs is entirely up to you. They may have different operating systems, access to different databases, different privileges for the Tron user, etc.
Nodes can be grouped into pools. To continue the previous example:
node_pools: - name:pool nodes: [node1, node2] jobs: # ... - name: "job2" node: pool schedule: "interval 5s" actions: - name: "pool_action" command: "ls /; sleep 1" cleanup_action: command: "echo 'all done'"
job2‘s action will be run on a random node from pool every 5 seconds. (Services behave slightly differently.) When pool_action is complete, cleanup_action will run on the same node.
For more information, see Jobs.
The job model is not appropriate for tasks that provide services to other tasks perhaps with more than one instance at once. For example, you might have a set of worker processes that send emails by continuously popping messages from a work queue:
# ... services: - name: "email_worker" node: pool count: 4 monitor_interval: 60 restart_interval: 120 pid_file: "/var/run/batch/%(name)s-%(instance_number)s.pid" command: "/usr/local/bin/start_email_worker --pid_file=%(pid_file)s"
This configuration will cause start_email_worker to be run on the nodes in the pool in the order node1, node2, node1, node2 (round robin scheduling).
The start_email_worker script (written by you) starts the worker and writes its pid to %(pid_file)s. Every 60 seconds, trond will see if pid in %(pid_file)s is still running on its node. If not, the service will be in a DEGRADED state and a new service instance will be started on the same node after 120 seconds.
In a system containing this example, you might have yet another service representing the work queue itself.
For more information, see Services.
While Tron solves many scheduling-related problems, there are a few things to watch out for.
Tron keeps an SSH connection open for the entire lifespan of a process. This means that to upgrade trond, you have to either wait until no jobs are running, or accept an inconsistent state. This limitation is being worked on, and should be improved in later releases.
Tron is under active development. This means that some things will change. Whenever possible these changes will be backwards compatible, but in some cases there may be non-backwards compatible changes.
Tron does not support unicode. Tron is built using twisted which does not support unicode.