2. Function reference (“API”)¶
2.1. Setup¶
-
doloop.
create
(dbconn, table, id_type='int', engine='InnoDB')¶ Create a task loop table. It has a schema like this:
CREATE TABLE `...` ( `id` INT NOT NULL, `last_updated` INT DEFAULT NULL, `lock_until` INT DEFAULT NULL, PRIMARY KEY (`id`), KEY `lock_until` (`lock_until`, `last_updated`) ) ENGINE=InnoDB
- id is the ID of the thing you want to update. It can refer to anything
that has a unique ID (doesn’t need to be another table in this database).
It also need not be an
INT
; see id_type, below. - last_updated: a unix timestamp; when the thing was last updated, or
NULL
if it never was - lock_until is also a unix timestamp. It’s used to keep workers from
grabbing the same IDs, and prioritization. See
get()
for details.
Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table. Something ending in
_loop
is recommended. - id_type (str) – alternate type for the
id
field (e.g.'VARCHAR(64)'
) - engine (str) – alternate storage engine to use (e.g.
'MyISAM'
)
There is no
drop()
function because programmatically dropping tables is risky. The relevant SQL is justDROP TABLE `...`
.- id is the ID of the thing you want to update. It can refer to anything
that has a unique ID (doesn’t need to be another table in this database).
It also need not be an
2.2. Adding and removing IDs¶
-
doloop.
add
(dbconn, table, id_or_ids, updated=False, test=False)¶ Add IDs to this task loop.
Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- id_or_ids – ID or list of IDs to add
- updated – Set this to true if these IDs have already been updated;
this will
last_updated
to the current time rather thanNULL
. - test – If
True
, don’t actually write to the database
Returns: number of IDs that are new
Runs this query with a write lock on table:
INSERT IGNORE INTO `...` (`id`, `last_updated`) VALUES (...), ...
(last_updated is omitted if updated is
False
.)
-
doloop.
remove
(dbconn, table, id_or_ids, test=False)¶ Remove IDs from this task loop.
Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- id_or_ids – ID or list of IDs to add
- test – If
True
, don’t actually write to the database
Returns: number of IDs removed
Runs this query with a write lock on table:
DELETE FROM `...` WHERE `id` IN (...)
2.3. Doing updates¶
-
doloop.
get
(dbconn, table, limit, lock_for=3600, min_loop_time=3600, test=False)¶ Get some IDs of things to update, and lock them.
Generally, after you’ve updated IDs, you’ll want to pass them to
did()
.The rules for fetching IDs are:
- First, fetch IDs which are locked but whose locks have expired. Start with the ones that have been locked the longest.
- Then, fetch unlocked IDs. Start with those that have never been updated, then fetch the ones that have gone the longest without being updated.
Ties (e.g. for newly inserted IDs) are broken arbitrarily by the database.
Note that because IDs whose locks have expired are selected first, the
lock_until
column can also be used to prioritize IDs; seebump()
.Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- limit (int) – max number of IDs to fetch
- lock_for – a conservative upper bound for how long we expect to take to update this ID, in seconds. Default is one hour. Must be positive.
- min_loop_time – If a job is unlocked, make sure it was last updated at least this many seconds ago, so that we don’t spin on the same IDs.
- test – If
True
, don’t actually write to the database
Returns: list of IDs
Runs this query with a write lock on table:
SELECT `id` FROM `...` WHERE `lock_until` <= UNIX_TIMESTAMP() ORDER BY `lock_until`, `last_updated` LIMIT ... SELECT `id` FROM `...` WHERE `lock_until` IS NULL AND (`last_updated` IS NULL OR `last_updated` <= UNIX_TIMESTAMP() - ...) ORDER BY `last_updated` LIMIT ... UPDATE `...` SET `lock_until` = UNIX_TIMESTAMP() + ... WHERE `id` IN (...)
-
doloop.
did
(dbconn, table, id_or_ids, auto_add=True, test=False)¶ Mark IDs as updated and unlock them.
Usually, these will be IDs that you grabbed using
get()
, but it’s perfectly fine to update arbitrary IDs on your own initiative, and mark them as done.Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- id_or_ids – ID or list of IDs that we just updated
- auto_add (bool) – Add any IDs that are not already in the table.
- test – If
True
, don’t actually write to the database
Returns: number of rows updated (mostly useful as a sanity check)
Runs this query with a write lock on table:
INSERT IGNORE INTO `...` (`id`) VALUES (...), ... UPDATE `...` SET `last_updated` = UNIX_TIMESTAMP(), `lock_until` = NULL WHERE `id` IN (...)
(
INSERT IGNORE
is only run if auto_add isTrue
.)
-
doloop.
unlock
(dbconn, table, id_or_ids, auto_add=True, test=False)¶ Unlock IDs without marking them updated.
Useful if you
get()
IDs, but are then unable or unwilling to update them.Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- id_or_ids – ID or list of IDs
- auto_add (bool) – Add any IDs that are not already in the table.
- test – If
True
, don’t actually write to the database
Returns: Either: number of rows updated/added or number of IDs that correspond to real rows. (MySQL unfortunately returns different row counts for
UPDATE
statements depending on how connections are configured.) Don’t use this for anything more critical than sanity checks and logging.Runs this query with a write lock on table:
INSERT IGNORE INTO `...` (`id`) VALUES (...), ... UPDATE `...` SET `lock_until` = NULL WHERE `id` IN (...)
(
INSERT IGNORE
is only run if auto_add isTrue
)
2.4. Prioritization¶
-
doloop.
bump
(dbconn, table, id_or_ids, lock_for=0, auto_add=True, test=False)¶ Bump priority of IDs.
Normally we set
lock_until
to the current time, which gives them priority without actually locking them (seeget()
for why this works).You can make IDs super-high-priority by setting lock_for to a negative value. For example, bumping an ID with
lock_for=-600
will give it the same priority as an ID that was bumped 600 seconds ago.You can also lock IDs for a little while, then prioritize them, by setting lock_for to a positive value. This can be useful in situations where you expect IDs might be bumped again in the near future, and you only want to run your update function once.
This function will only ever decrease
lock_until
; it’s not possible to keep something locked forever by continually bumping it.Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- id_or_ids – ID or list of IDs
- lock_for – Number of seconds that the IDs should stay locked.
- auto_add (bool) – Add any IDs that are not already in the table.
- test – If
True
, don’t actually write to the database
Returns: number of IDs bumped (mostly useful as a sanity check)
Runs this query with a write lock on table:
INSERT IGNORE INTO `...` (`id`) VALUES (...), ... UPDATE `...` SET `lock_until` = UNIX_TIMESTAMP() + ... WHERE (`lock_until` IS NULL OR `lock_until` > UNIX_TIMESTAMP() + ...) AND `id` IN (...)
(
INSERT IGNORE
is only run if auto_add isTrue
)
2.5. Auditing¶
-
doloop.
check
(dbconn, table, id_or_ids)¶ Check the status of particular IDs.
Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- id_or_ids – ID or list of IDs
Returns a dictionary mapping ID to a tuple of
(since_updated, locked_for)
, that is, the current time minuslast_updated
, andlock_for
minus the current time (both of these in seconds).This function does not require write access to your database and does not lock tables.
Runs this query:
SELECT `id`, UNIX_TIMESTAMP() - `last_updated`, `lock_until` - UNIX_TIMESTAMP()` FROM `...` WHERE `id` IN (...)
-
doloop.
stats
(dbconn, table)¶ Get stats on the performance of the task loop as a whole.
Parameters: - dbconn – any DBI-compliant MySQL connection object
- table (str) – name of your task loop table
- delay_thresholds – enables the delayed stat; see below
It returns a dictionary containing these keys:
- bumped: number of IDs where
lock_until
is now or in the past. (These IDs have the highest priority; seeget()
.) - locked: number of IDs where
lock_until
is in the future - min_bump_time/max_bump_time: min/max number of seconds that any
ID has been prioritized (
lock_until
now or in the past) - min_id/max_id: min and max IDs (or
None
if table is empty) - min_lock_time/max_lock_time: min/max number of seconds that any ID is locked for
- min_update_time/max_update_time: min/max number of seconds that an ID has gone since being updated
For convenience and readability, all times will be floating point numbers.
Only min_id and max_id can be
None
(when the table is empty). Everything else defaults to zero.This function does not require write access to your database and does not lock tables.
stats()
only scans locked/bumped rows and use indexes for everything else, so it should be very fast except in pathological cases. It runs these queries in a single transaction:SELECT MIN(`id`), MAX(`id`), UNIX_TIMESTAMP() FROM `...` SELECT MIN(`last_updated`), MAX(`last_updated`), FROM `...` WHERE `lock_until` IS NULL; SELECT COUNT(*), MIN(`last_updated`), MAX(`last_updated`), MIN(`lock_until`), MAX(`lock_until`), FROM `...` WHERE `lock_until` > ... SELECT COUNT(*), MIN(`last_updated`), MAX(`last_updated`), MIN(`lock_until`), MAX(`lock_until`), FROM `...` WHERE `lock_until` <= ...