Change Sets

Changesets are implemented using an extension of the RDF ChangeSet Vocabulary. There is a good overview of the subject as it originally stood on the Talis wiki page. The changeset vocabulary as defined by Talis has some limitations, unfortunately. In particular,

  • a changeset is limited to statements about a single subject
  • there is no way to specify which graph that the changes pertain to

The second problem is more a limitation of RDF itself as originally specified though there has been talk of fixing this in the next version.

An example changeset that keeps to the original spec might look something like this:

:csid a cs:ChangeSet ;
    cs:createdDate "1970-01-01"^^xsd:date ;
    cs:creatorName "Some Body" ;
    cs:changeReason "A change must be made" ;
    cs:preceedingChangeSet :previousid ;
    cs:subjectOfChange <http://example.org/> ;
    cs:addition [
        rdf:subject <http://example.org/> ;
        rdf:predicate rdfs:label ;
        rdf:object "Example"
    ] .

In otherwords, add a label to http://example.org/. There is a corresponding predicate, cs:removal to remove statements. The linkage with cs:preceedingChangeSet points to the previous change involving that resource which gives a way to walk the change history.

It is not clear why there can only be one cs:subjectOfChange and that every cs:addition and cs:removal must concern only it. Most often changes that are made will concern more than one resource and it is natural to consider them together as a single atomic unit.

The first extension that we have implemented is to allow multiple cs:subjectOfChange values and remove the restriction on the resources that are added or removed.

In our implementation, though this is not true of RDF generally, we have a quite clear notion of what constitutes an RDF graph. Namely it is a collection of triples. It might easily correspond to the notion of a document because it is the collection of triples that you receive when dereferencing a particular URI (i.e. the identifier of that graph). When we make changes, we don’t make changes to triples in the abstract sense, rather we make changes to the triples that exist in a particular graph.

This does not require changes to the range of cs:subjectOfChange since its range is rdfs:Resource, and from the specification,

All things described by RDF are called resources, and are instances of the class rdfs:Resource. This is the class of everything.

That said, the range could be narrowed if there were such an entity as rdf:Graph though the utility of such a class is debatable.

What is required is for a way to specify the graph concerned in the reified triples. What we really want to do is:

cs:addition [
    rdf:subject <http://example.org/> ;
    rdf:predicate rdfs:label ;
    rdf:object "Example" ;
    rdf:graph <http://example.org/data/>
] .

but we are prevented through lack of a suitable rdf:graph predicate. (n.b. there is no reason why the objects of rdf:subject and rdf:graph could not be the same, in fact in most instances they probably would be).

The second extension is the definition of a predicate for use in reification to indicate the graph in question. The predicate is, for the time being:

http://bibliographica.org/schema/graph#graph

though we are hopeful a suitable replacement will be included in RDF in the future and, failing that, we will try to obtain a http://purl.org namespace for it.

We also require a way to indicate which changeset is the most recent for a particular graph and so we add a triple:

:graphid ordf:changeSet :csid

to graphs so modified.

The third extension is to introduce a property ordf:changeSet into the changeset vocabulary for referring to instances of the cs:ChangeSet class.

The fourth extension is not so much concerned with structure so much as summarising information about a changeset. We introduce ordf:additionCount and ordf:removalCount to be the number of additions and removals in a particular changeset.

The ChangeSet Class

class ordf.vocab.changeset.ChangeSet(name=None, reason=None, store='IOMemory', identifier=None, namespace=Namespace('urn:uuid:'))[source]

Bases: ordf.graph.Graph

ChangeSet Graph. Typically one does something like,

cs = ChangeSet("some name", "some reason")
cs.diff(g1_orig, g1_new)
cs.diff(g2_orig, g2_new)
cs.commit()

There are two instantiation paths. The usual one where name and reason parameters are supplied is intended for constructing new changesets. The other where *store and identifier are supplied is intended for accessing previously stored changesets.

Parameters:
  • name – The name of the person or entity creating this changeset. This may be a string or an rdflib datatype (e.g. URIRef, Literal)
  • reason – A description of the change. This may be a string or an instance of Literal
  • store – When obtaining an existing changeset, the rdflib.store.Store which contains it.
  • identifier – When obtaining an existing changeset, the graph identifier that should be used to find it in the store
  • namespace – When changesets are created they are assigned a name. The name is generated using the uuid.uuid1() function. It is then appended to the provided namespace.
metadata[source]

A representation of the metadata of this changeset graph. This excludes any cs:addition and cs:removal properties

diff(orig, new)[source]

Populate the ChangeSet with the differences between orig and new.

Parameters:
  • orig – original graph
  • new – new graph
Returns:

number of distinct changes (additions + removals)

commit()[source]

Commit the changes, mark the changeset read-only.

rollback()[source]

Empty the changeset. Fails if :meth:commit has already been called.

apply(graph)[source]

Apply the changeset to a graph

undo(orig)[source]

Undo the changes in the changeset on a graph.

Table Of Contents

Previous topic

RDF Graphs

Next topic

Friend of a Friend

This Page