Data Model ========== A timeline is a chronological list of site activity that is relevant to the viewer. Users indirectly control the content of their timelines by "following" other users or by "watching" projects. References: * http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture * http://activitystrea.ms/ The primary constructs required for timelines are: * A graph of connections between nodes in the network. In our case, the nodes will be Users and Projects (at least at first). * An Activity object, representing something that happens on the site. The Graph --------- The graph is a list of nodes, each node having inbound edges (from other nodes that are watching or following it) and outbound edges (to other nodes that it is watching or following). The graph will be represented in a mongo collection:: [{"node_id": "user:4f9f4cbf0594ca4156000a5e", "followers": [... list of node ids ...], "following": [... list of node ids ...]}, ... ] According to this scheme, each graph edge is stored twice. So, for example, when a user watches a project, the project node would be added to the user's `following` list, and the user node would be added to the project's `followers` list. In python code, any object that implements the `NodeBase` interface can participate in the graph, and can therefore follow or be followed by other nodes. You can do this easily by subclassing your object from `Node` (or using it as a mixin), and overriding the `node_id` property:: class User(MappedClass, Node): @property def node_id(self): """A string that uniquely identifies this node in the network.""" return "user:%s" % self._id Activities ---------- An activity consists of an actor, a verb, an object, and optionally, a target. This is best illustrated with an example:: John posted a comment on ticket #42. ---- ------ ------- ---------- actor verb object target The datatypes of these are as follows: * actor - Node, ActivityObject * verb - str * object - ActivityObject * target - ActivityObject To use an object in an Activity, subclass from ActivityObject (or use it as a mixin), and override the required properties:: class Ticket(VersionedArtifact, ActivityObject): @property def activity_name(self): """Display name for this object, in the context of an Activity.""" return "Ticket #%s" % self.ticket_num @property def activity_url(self): """URL of this object.""" return self.url() Note that the actor must be an ActivityObject and a Node, since it must be a participant in the network in order to perform an activity. .. note:: ActivityObject + Node might be combined into an `Actor` interface. Activities will be represented in a mongo collection:: [{"node_id": "user:4f9f4cbf0594ca4156000a5e", "actor": { "activity_name": "John", "activity_url": "http://..."}, "verb": "posted", "object": { "activity_name": "comment", "activity_url": "http://..."}, "target": { "activity_name": "ticket #42", "activity_url": "http://..."}, "published": ISODate("2011-10-12T14:54:02.069Z")} }, ... ] As you can see, an activity is related to a Node via a `node_id` when it is stored. A copy of the activity is stored for every Node involved in the activity. Timelines --------- Timelines (or "activity streams") are ordered lists of activities. We want to be able to see a timeline of activity for a Node in our graph. For example, when I look at *my* timeline, I want to see my recent activities as well as the activities of the people I am following. When someone else looks at my timeline, they probably want to see only those activities that I performed (where I was the actor). When they look at the timeline of a project, they probably want to see all recent activity on that project, regardless of actor. Timelines are requested from the `Aggregator`:: Aggregator.get_timeline(node, page=0, limit=100, actor_only=False) As you can see, timelines are requested for a particular node, and can be paged and limited in size. `actor_only=True` means, "only give me activities that this node performed." The `Aggregator` *aggregates* activities together from nodes in the network that are connected. If I am following someone, I want to see their activity in my timeline. It's the `Aggregator's` job to combine these activities together into one timeline. In general we want the aggregation to be done out-of-band so that when a timeline is requested, it's already prepared. This can be achieved by using a task processing system. The `Aggregator` can also be subclassed to customize the way timelines are created. More on these topics later. Future Goals ============ * Get rid of the global _director var * The `Aggregator` is meant to be subclassed and changed/extended, but there is no way to tell the library to use an external Aggregator. Provide a hook for doing so. * When this library was started, we were having performance issues with Ming, so raw pymongo was used. With Ming performing well now, it may be worthwhile to revisit that decision. Using Ming could arguably make the code easier to understand and maintain. * It would be really nice to neatly decouple the persistence code from the rest of the library so that alternate storage backends could be implemented and used. * Timeline aggregation may be costly, and should be performed in a separate process. Provide a hook point that will be called when an aggregation is needed, so client code can use whatever task processing system is desired.