Sessionization

The primary purpose of this library is to provide facilities to aid in sessionizing chronological sequences of activities into mwsessions.Session. You are provided with two options. mwsessions.sessionize() takes an iterable of (user, timestamp, event_data) triples and returns an iterator of mwsessions.Session. mwsessions.Sessionizer, on the other hand, provides a process() method that allows you to process events one-at-a-time.

mwsessions.sessionize(user_events, cutoff=3600)

Clusters user sessions from a sequence of user events. Note that, event data will simply be returned in the case of a revert.

This function serves as a convenience wrapper around calls to Cache‘s process() method.

Parameters:
user_events : iter( (user, timestamp, event) )

an iterable over tuples of user, timestamp and event data.

  • user : hashable
  • timestamp : mw.Timestamp
  • event : mixed
cutoff : int

the maximum time between events within a user session

Returns:

a iterator over Session

Example:
>>> import mwsessions
>>>
>>> user_events = [
...     ("Willy on wheels", 20150101000000, {'rev_id': 1}),
...     ("Walter", 20150101000001, {'rev_id': 2}),
...     ("Willy on wheels", 20150101000001, {'rev_id': 3}),
...     ("Walter", 100035, {'rev_id': 4}),
...     ("Willy on wheels", 103602, {'rev_id': 5})
... ]
>>>
>>> for user, events in mwsessions.sessionize(user_events):
...     (user, events)
...
('Willy on wheels', [{'rev_id': 1}, {'rev_id': 3}])
('Walter', [{'rev_id': 2}, {'rev_id': 4}])
('Willy on wheels', [{'rev_id': 5}])
class mwsessions.Sessionizer(cutoff=3600)

Constructs an object that manages state for sessionization. Since sessions expire once activities stop for at least cutoff seconds, this class manages a cache of active sessions and uses that to process new events.

Parameters:
cutoff : int

Maximum amount of time in seconds between within-session events

Example:
>>> from mw.lib import sessions
>>>
>>> cache = sessions.Cache(cutoff=3600)
>>>
>>> list(cache.process("Willy on wheels", 100000, {'rev_id': 1}))
[]
>>> list(cache.process("Walter", 100001, {'rev_id': 2}))
[]
>>> list(cache.process("Willy on wheels", 100001, {'rev_id': 3}))
[]
>>> list(cache.process("Walter", 100035, {'rev_id': 4}))
[]
>>> list(cache.process("Willy on wheels", 103602, {'rev_id': 5}))
[Session(user='Willy on wheels',
         events=[{'rev_id': 1}, {'rev_id': 3}])]
>>> list(cache.get_active_sessions())
[Session(user='Walter', events=[{'rev_id': 2}, {'rev_id': 4}]),
 Session(user='Willy on wheels', events=[{'rev_id': 5}])]
class mwsessions.Session

Represents a user session (a cluster over events for a user). This class behaves like collections.namedtuple. Note that the datatypes of events, is not specified since those types will depend on the revision data provided during revert detection.

Attributes:
user

A hashable user identifier : hashable

events

A list of event data : list( mixed )

Previous topic

MediaWiki Sessions

Next topic

Utilities

This Page