Implements a set of datasources oriented off of a single revision. This is useful for extracting features of edit and article quality.

revscoring.datasources.revision_oriented.revision = {revision}

Represents the base revision of interest. Implements this structure:

Supporting classes

class revscoring.datasources.revision_oriented.Revision(name, include_parent=True, include_user=True, include_user_info=True, include_user_last_revision=False, include_page=True, include_page_creation=False, include_content=False)[source]

Represents a revision

id = None

int : Revision ID

timestamp = None

mwtypes.Timestamp : Timestamp the revision was saved

comment = None

str : The comment saved with the revision

byte_len = None

int : The length of the revision content in bytes

minor = None

bool : Was the revision flagged as minor?

content_model = None

str : Describes the format of revision content

text = None

str : The decoded (Unicode) text of the revision content

parent = None

Revision : The parent (aka “previous”) revision of the page.

page = None

Page : The page in which the revision was saved.

user = None

User : The user who saved the revision.

diff = None

Diff : The difference between this revision and the parent revision.

class revscoring.datasources.revision_oriented.Diff(name)[source]

Represents the difference between two sequential revisions.

class revscoring.datasources.revision_oriented.Page(name, include_creation=False)[source]

Represents a revision’s page

id = None

int : The page’s ID

title = None

str : The page’s title (namespace stripped)

namespace = None

Namespace : The namespace information.

creation = None

Revision : The first revision to the page.

class revscoring.datasources.revision_oriented.Namespace(name)[source]

Represents a page’s namespace

id = None

int : The namespace’s ID

name = None

str : The name of the namespace

class revscoring.datasources.revision_oriented.User(name, include_info=True, include_last_revision=False)[source]

Represents a user’s id and name/ip

id = None

int : The id of the user who saved the edit. 0 for IPs.

text = None

str : The user’s name or IP address

info = None

UserInfo : Information about the user.

last_revision = None

Revision : The last revision the user saved before the revision of reference.

class revscoring.datasources.revision_oriented.UserInfo(name)[source]

Represents a user’s information

editcount = None

int : A count of edits the user has ever saved

registration = None

mwtypes.Timestamp : The date the user registered

groups = None

set ( str ) : The groups the user is a member of

emailable = None

bool : True if the users is emailable, False otherwise

gender = None

str : A string representing the user’s gender preference.

