1. Tutorial¶
1.1. Setup¶
If you don’t have a MySQL server, you’ll need to install and run one . doloop
uses fairly basic SQL, and should work on MySQL versions as early as 5.0,
if not earlier.
You’ll also want to install a Python MySQL driver, such as PyMySQL.
Next, you’ll want to create at least one doloop table:
create-doloop-table user_loop | mysql -D test # or a db of your choice
This table is used to keep track of what IDs we care about, and how recently they’ve been updated. You’ll want one table per kind of update on kind of thing.
For example, if you want to separately update users’ profile pages and their
friend recommendations, you’d want two tables, named something like
user_profile_loop
and user_friend_loop
.
By default, doloop
assumes IDs are INTs
, but you can use any
column type that can be a primary key. For example, if your IDs are
64-character ASCII strings:
create-doloop-table -i 'CHAR(64) CHARSET ascii' user_loop | mysql -D test
You can also create tables programmatically using doloop.create()
and
doloop.sql_for_create()
.
1.2. Adding and removing IDs¶
Next, you’ll want to make sure the IDs of the things you want to keep updated
are in your doloop table. Use doloop.add()
to add them:
dbconn = MySQLdb.connection(...)
for user_id in ...: # your function to stream all user IDs
doloop.add(dbconn, 'user_loop', user_id)
You’ll also want to add a call to doloop.add()
to your user creation
code. doloop.add()
uses INSERT IGNORE
, so it’s fine to call
it several times for the same ID.
Each call to doloop.add()
gets a write lock on user_loop
, so it’s
much more efficient to add chunks of several IDs at a time:
for list_of_user_ids in ...:
doloop.add(dbconn, 'user_loop', list_of_user_ids)
If something no longer needs to be updated (e.g. the user closes their
account), you can remove the ID with doloop.remove()
.
1.3. Doing updates¶
The basic workflow is to use doloop.get()
to grab the IDs of the
things that have gone the longest without being updated, perform your updates,
and then mark them as done with doloop.did()
:
user_ids = doloop.get(dbconn, 'user_loop', 1000)
for user_id in user_ids:
... # run your update logic
doloop.did(dbconn, 'user_loop', user_ids)
A good, low-effort way to set up workers is to write a script that runs in a
crontab. It’s perfectly safe (and encouraged) to run several workers
concurrently; doloop.get()
will lock the IDs it grabs so that other
workers don’t try to update the same things.
You should make sure that your update logic can be safely called
twice concurrently for the same ID. In fact, it’s totally cool for code that
has never called doloop.get()
to update arbitrary things and then call
did()
on their IDs to let the workers know. It’s also a
good idea for your update code to gracefully handle nonexistent IDs.
How many workers you want and when they run is up to you. If
there turn out not to be enough workers, things will simply be updated less
often than you’d like. You can set a limit on how frequently the same ID
will be updated using the min_loop_time argument to
get()
; by default, this is one hour.
Also, don’t worry too much about your workers crashing. By default, IDs are
locked for an hour (also configurable, with the lock_for argument to
get()
), so they’ll eventually get unlocked and fetched by
another worker. Conversely, if there is a problem ID that always causes a
crash, that problem ID won’t bother your workers for another hour.
You can also explicitly unlock IDs, without marking them as updated, using
doloop.unlock()
.
1.4. Prioritization¶
So, this is a great system for making sure every user gets updated eventually,
but some users are more active than others. You can use doloop.bump()
to prioritize certain ID(s):
def user_do_something_noteworthy(user_id):
... # your logic for the user doing something noteworthy
doloop.bump(dbconn, 'user_loop', user_id)
doloop
has an elegant (or depending how you look at it, too-magical)
rule that IDs which are locked get highest priority once the lock expires.
By default, bump()
sets the lock to expire immediately, so
we get priority without any waiting.
However, in real life, users are likely to do several noteworthy things in one session (well, depending on your users). You can avoid updating the same user several times by setting lock_for. For example, the first time a user does something noteworthy, this code will keep them locked for an hour, after which they’ll be prioritized:
def user_do_something_noteworthy(user_id):
...
doloop.bump(dbconn, 'user_loop', user_id, lock_for=60*60)
If a particularly special user did noteworthy things continuously, they’d
still get updated more or less hourly; you can’t repeatedly
bump()
things into the future.
If for some reason you forgot to add a user, bump()
will
automatically add them before bumping them (as will did()
and unlock()
). An alternate way to use doloop
is to bump()
every time something changes, secure in the
knowledge that if you forgot to add a call to bump()
somewhere, things will still get updated eventually.
Also, due to doloop
‘s elegant/too-magical semantics, you can give
ID(s) super-high priority by setting lock_for to a negative number. At a
certain point, though, you should just do the update immediately and call
did()
.
1.5. Auditing¶
If you want to check on a particular ID or set of IDs, for example to see how
long it’s gone without being updated, you can use doloop.check()
.
To check on the status of the task loop as a whole, use
doloop.stats()
. Among other things, this can tell you how many IDs
have gone more than a day/week without being updated.