mw.lib.sessions – event clustering

mw.lib.sessions.cluster(user_events, cutoff=3600)

Clusters user sessions from a sequence of user events. Note that, event data will simply be returned in the case of a revert.

This function serves as a convenience wrapper around calls to Cache‘s process() method.

Parameters:
user_events : iter( (user, timestamp, event) )

an iterable over tuples of user, timestamp and event data.

cutoff : int

the maximum time between events within a user session

Returns:

a iterator over Session

Example:
>>> from mw.lib import sessions
>>>
>>> user_events = [
...     ("Willy on wheels", 100000, {'rev_id': 1}),
...     ("Walter", 100001, {'rev_id': 2}),
...     ("Willy on wheels", 100001, {'rev_id': 3}),
...     ("Walter", 100035, {'rev_id': 4}),
...     ("Willy on wheels", 103602, {'rev_id': 5})
... ]
>>>
>>> for user, events in sessions.cluster(user_events):
...     (user, events)
...
('Willy on wheels', [{'rev_id': 1}, {'rev_id': 3}])
('Walter', [{'rev_id': 2}, {'rev_id': 4}])
('Willy on wheels', [{'rev_id': 5}])
class mw.lib.sessions.Session

Represents a user session (a cluster over events for a user). This class behaves like collections.namedtuple. Note that the datatypes of events, is not specified since those types will depend on the revision data provided during revert detection.

Members:
user

A hashable user identifier : hashable

events

A list of event data : list( mixed )

class mw.lib.sessions.Cache(cutoff=3600)

A cache of recent user session. Since sessions expire once activities stop for at least cutoff seconds, this class manages a cache of active sessions.

Parameters:
cutoff : int

Maximum amount of time in seconds between session events

Example:
>>> from mw.lib import sessions
>>>
>>> cache = sessions.Cache(cutoff=3600)
>>>
>>> list(cache.process("Willy on wheels", 100000, {'rev_id': 1}))
[]
>>> list(cache.process("Walter", 100001, {'rev_id': 2}))
[]
>>> list(cache.process("Willy on wheels", 100001, {'rev_id': 3}))
[]
>>> list(cache.process("Walter", 100035, {'rev_id': 4}))
[]
>>> list(cache.process("Willy on wheels", 103602, {'rev_id': 5}))
[Session(user='Willy on wheels', events=[{'rev_id': 1}, {'rev_id': 3}])]
>>> list(cache.get_active_sessions())
[Session(user='Walter', events=[{'rev_id': 2}, {'rev_id': 4}]), Session(user='Willy on wheels', events=[{'rev_id': 5}])]
get_active_sessions()

Retrieves the active, unexpired sessions.

Returns:A generator of Session
process(user, timestamp, data=None)

Processes a user event.

Parameters:
user : hashable

A hashable value to identify a user (int or str are OK)

timestamp : mw.Timestamp

The timestamp of the event

data : mixed

Event meta data

Returns:

A generator of Session expired after processing the user event.

Constants

mw.lib.sessions.defaults.CUTOFF = 3600

TODO: Better documentation here. For the time being, see https://meta.wikimedia.org/wiki/Research:Edit_session

Table Of Contents

Previous topic

mw.lib.reverts – detecting reverts

Next topic

mw.lib.title – parsing and normalizing titles

This Page