Usage

Invenio module that implements OAI-PMH server.

Invenio-OAIServer is exposing records via OAI-PMH protocol. The core part is reponsible for managing OAI sets that are defined using queries.

OAIServer consists of:

  • OAI-PMH 2.0 compatible endpoint.
  • Persistent identifier minters, fetchers and providers.
  • Backend for formating Elastic search results.

Initialization

First create a Flask application (Flask-CLI is not needed for Flask version 1.0+):

>>> from flask import Flask
>>> app = Flask('myapp')
>>> app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite://'
>>> app.config['CELERY_ALWAYS_EAGER'] = True
>>> if not hasattr(app, 'cli'):
...     from flask_cli import FlaskCLI
...     ext_cli = FlaskCLI(app)

There are several dependencies that should be initialized in order to make OAIServer work correctly.

>>> from invenio_db import InvenioDB
>>> from invenio_indexer import InvenioIndexer
>>> from invenio_pidstore import InvenioPIDStore
>>> from invenio_records import InvenioRecords
>>> from invenio_search import InvenioSearch
>>> from flask_celeryext import FlaskCeleryExt
>>> ext_db = InvenioDB(app)
>>> ext_indexer = InvenioIndexer(app)
>>> ext_pidstore = InvenioPIDStore(app)
>>> ext_records = InvenioRecords(app)
>>> ext_search = InvenioSearch(app)
>>> ext_celery = FlaskCeleryExt(app)

Then you can initialize OAIServer like a normal Flask extension, however you need to set following configuration options first:

>>> app.config['OAISERVER_RECORD_INDEX'] = 'marc21',
>>> app.config['OAISERVER_ID_PREFIX'] = 'oai:example:',
>>> from invenio_oaiserver import InvenioOAIServer
>>> ext_oaiserver = InvenioOAIServer(app)

Register the blueprint for OAIServer. If you use InvenioOAIServer as part of the invenio-base setup, the blueprint will be registered automatically through an entry point.

>>> from invenio_oaiserver.views.server import blueprint
>>> app.register_blueprint(blueprint)

In order for the following examples to work, you need to work within an Flask application context so let’s push one:

>>> ctx = app.app_context()
>>> ctx.push()

Also, for the examples to work we need to create the database and tables (note, in this example we use an in-memory SQLite database):

>>> from invenio_db import db
>>> db.create_all()

And create the indices on Elasticsearch.

>>> indices = list(ext_search.create(ignore=[400]))
>>> from time import sleep
>>> sleep(5)

Creating OAI sets

“A set is an optional construct for grouping records for the purpose of selective harvesting” [OAISet]. The easiest way to create new OAI set is using database model.

>>> from invenio_oaiserver.models import OAISet
>>> oaiset = OAISet(spec='higgs', name='Higgs', description='...')
>>> oaiset.search_pattern = 'title:higgs'
>>> db.session.add(oaiset)
>>> db.session.commit()

The above set will group all records that contain word “higgs” in the title.

We can now see the set by using verb ListSets:

>>> with app.test_client() as client:
...     res = client.get('/oai2d?verb=ListSets')
>>> res.status_code
200
>>> b'Higgs' in res.data
True
[OAISet]https://www.openarchives.org/OAI/openarchivesprotocol.html#Set

Data model

Response serializer, indexer and search expect _oai key in record data with following structure.

{
    "_oai": {
        "id": "oai:example:1",
        "sets": ["higgs", "demo"],
        "updated": "2012-07-04T15:00:00Z"
    }
}

There must exist a id key with not null value otherwise the record is not exposed via OAI-PHM interface (listIdentifiers, listRecords). The value of this field should be regitered in PID store. We provide default oaiid_minter() that can register existing value or mint new one by concatenating a configuration option OAISERVER_ID_PREFIX and record value from control_number field.

All values in sets must exist in spec column in oaiserver_set table or they will be removed when record updater is executed. The last field updated contains ISO8601 datetime of the last record metadata modification acording to following rules for selective harvesting.

Configuration

The details of the configuration options for OAI-PMH server.

invenio_oaiserver.config.OAISERVER_ADMIN_EMAILS = ['info@inveniosoftware.org']

The e-mail addresses of administrators of the repository.

It must include one or more instances.

invenio_oaiserver.config.OAISERVER_CACHE_KEY = 'DynamicOAISets::'

Key prefix added before all keys in cache server.

invenio_oaiserver.config.OAISERVER_CELERY_TASK_CHUNK_SIZE = 100

Specify the maximum number of records each task will update.

invenio_oaiserver.config.OAISERVER_CONTROL_NUMBER_FETCHER = 'recid'

PIDStore fetcher for the OAI ID control number.

invenio_oaiserver.config.OAISERVER_GRANULARITY = 'YYYY-MM-DDThh:mm:ssZ'

The finest harvesting granularity supported by the repository.

The legitimate values are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ with meanings as defined in ISO8601.

invenio_oaiserver.config.OAISERVER_METADATA_FORMATS = {'oai_dc': {'namespace': 'http://www.openarchives.org/OAI/2.0/oai_dc/', 'serializer': ('invenio_oaiserver.utils:dumps_etree', {'xslt_filename': '/home/travis/build/inveniosoftware/invenio-oaiserver/invenio_oaiserver/static/xsl/MARC21slim2OAIDC.xsl'}), 'schema': 'http://www.openarchives.org/OAI/2.0/oai_dc.xsd'}, 'marc21': {'namespace': 'http://www.loc.gov/MARC21/slim', 'serializer': ('invenio_oaiserver.utils:dumps_etree', {'prefix': 'marc'}), 'schema': 'http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd'}}

Define the metadata formats available from a repository.

Every key represents a metadataPrefix and its value has a following structure.

  • schema - the location of an XML Schema describing the format;
  • namespace - the namespace of serialized document;
  • serializer - the importable string or tuple with the importable string and keyword arguments.

Note

If you are migrating an instance running older versions of Invenio<=2.1, you might want to copy settings from 'marc21' key to 'marcxml' in order to ensure compatibility for all your OAI-PMH clients.

invenio_oaiserver.config.OAISERVER_PAGE_SIZE = 10

Define maximum length of list responses.

Request with verbs ListRecords, ListIdentifiers, and ListSets are affected by this option.

invenio_oaiserver.config.OAISERVER_QUERY_PARSER = 'invenio_query_parser.parser:Main'

Define query parser for OIASet definition.

invenio_oaiserver.config.OAISERVER_QUERY_WALKERS = ['invenio_query_parser.walkers.pypeg_to_ast:PypegConverter']

List of query AST walkers.

invenio_oaiserver.config.OAISERVER_RECORD_INDEX = 'records'

Specify an Elastic index with records that should be exposed via OAI-PMH.

invenio_oaiserver.config.OAISERVER_REGISTER_RECORD_SIGNALS = True

Catch record/set insert/update/delete signals and update the _oai field.

invenio_oaiserver.config.OAISERVER_REGISTER_SET_SIGNALS = True

Catch set insert/update/delete signals and update the _oai record field.

invenio_oaiserver.config.OAISERVER_RESUMPTION_TOKEN_EXPIRE_TIME = 60

The expiration time of a resuption token in seconds.

Default: 60 seconds = 1 minute.

Note

Setting longer expiration time may have a negative impact on your Elastic search cluster as it might need to keep open cursors.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html