Usage¶
Invenio module that implements OAI-PMH server.
Invenio-OAIServer is exposing records via OAI-PMH protocol. The core part is reponsible for managing OAI sets that are defined using queries.
OAIServer consists of:
- OAI-PMH 2.0 compatible endpoint.
- Persistent identifier minters, fetchers and providers.
- Backend for formating Elastic search results.
Initialization¶
First create a Flask application (Flask-CLI is not needed for Flask version 1.0+):
>>> from flask import Flask
>>> app = Flask('myapp')
>>> app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite://'
>>> app.config['CELERY_ALWAYS_EAGER'] = True
>>> if not hasattr(app, 'cli'):
... from flask_cli import FlaskCLI
... ext_cli = FlaskCLI(app)
There are several dependencies that should be initialized in order to make OAIServer work correctly.
>>> from invenio_db import InvenioDB
>>> from invenio_indexer import InvenioIndexer
>>> from invenio_pidstore import InvenioPIDStore
>>> from invenio_records import InvenioRecords
>>> from invenio_search import InvenioSearch
>>> from flask_celeryext import FlaskCeleryExt
>>> ext_db = InvenioDB(app)
>>> ext_indexer = InvenioIndexer(app)
>>> ext_pidstore = InvenioPIDStore(app)
>>> ext_records = InvenioRecords(app)
>>> ext_search = InvenioSearch(app)
>>> ext_celery = FlaskCeleryExt(app)
Then you can initialize OAIServer like a normal Flask extension, however you need to set following configuration options first:
>>> app.config['OAISERVER_RECORD_INDEX'] = 'marc21',
>>> app.config['OAISERVER_ID_PREFIX'] = 'oai:example:',
>>> from invenio_oaiserver import InvenioOAIServer
>>> ext_oaiserver = InvenioOAIServer(app)
Register the blueprint for OAIServer. If you use InvenioOAIServer as part of the invenio-base setup, the blueprint will be registered automatically through an entry point.
>>> from invenio_oaiserver.views.server import blueprint
>>> app.register_blueprint(blueprint)
In order for the following examples to work, you need to work within an Flask application context so let’s push one:
>>> ctx = app.app_context()
>>> ctx.push()
Also, for the examples to work we need to create the database and tables (note, in this example we use an in-memory SQLite database):
>>> from invenio_db import db
>>> db.create_all()
And create the indices on Elasticsearch.
>>> indices = list(ext_search.create(ignore=[400]))
>>> from time import sleep
>>> sleep(5)
Creating OAI sets¶
“A set is an optional construct for grouping records for the purpose of selective harvesting” [OAISet]. The easiest way to create new OAI set is using database model.
>>> from invenio_oaiserver.models import OAISet
>>> oaiset = OAISet(spec='higgs', name='Higgs', description='...')
>>> oaiset.search_pattern = 'title:higgs'
>>> db.session.add(oaiset)
>>> db.session.commit()
The above set will group all records that contain word “higgs” in the title.
We can now see the set by using verb ListSets
:
>>> with app.test_client() as client:
... res = client.get('/oai2d?verb=ListSets')
>>> res.status_code
200
>>> b'Higgs' in res.data
True
[OAISet] | https://www.openarchives.org/OAI/openarchivesprotocol.html#Set |
Data model¶
Response serializer, indexer and search expect _oai
key in record data
with following structure.
{
"_oai": {
"id": "oai:example:1",
"sets": ["higgs", "demo"],
"updated": "2012-07-04T15:00:00Z"
}
}
There must exist a id
key with not null value otherwise the record
is not exposed via OAI-PHM interface (listIdentifiers
, listRecords
).
The value of this field should be regitered in PID store. We provide default
oaiid_minter()
that can register existing
value or mint new one by concatenating a configuration option
OAISERVER_ID_PREFIX
and record value from control_number
field.
All values in sets
must exist in spec
column in oaiserver_set
table or they will be removed when record updater is executed. The last
field updated
contains ISO8601 datetime of the last record metadata
modification acording to following rules for selective harvesting.
Configuration¶
The details of the configuration options for OAI-PMH server.
-
invenio_oaiserver.config.
OAISERVER_ADMIN_EMAILS
= ['info@inveniosoftware.org']¶ The e-mail addresses of administrators of the repository.
It must include one or more instances.
-
invenio_oaiserver.config.
OAISERVER_CACHE_KEY
= 'DynamicOAISets::'¶ Key prefix added before all keys in cache server.
-
invenio_oaiserver.config.
OAISERVER_CELERY_TASK_CHUNK_SIZE
= 100¶ Specify the maximum number of records each task will update.
-
invenio_oaiserver.config.
OAISERVER_CONTROL_NUMBER_FETCHER
= 'recid'¶ PIDStore fetcher for the OAI ID control number.
-
invenio_oaiserver.config.
OAISERVER_GRANULARITY
= 'YYYY-MM-DDThh:mm:ssZ'¶ The finest harvesting granularity supported by the repository.
The legitimate values are
YYYY-MM-DD
andYYYY-MM-DDThh:mm:ssZ
with meanings as defined in ISO8601.
-
invenio_oaiserver.config.
OAISERVER_METADATA_FORMATS
= {'oai_dc': {'namespace': 'http://www.openarchives.org/OAI/2.0/oai_dc/', 'serializer': ('invenio_oaiserver.utils:dumps_etree', {'xslt_filename': '/home/travis/build/inveniosoftware/invenio-oaiserver/invenio_oaiserver/static/xsl/MARC21slim2OAIDC.xsl'}), 'schema': 'http://www.openarchives.org/OAI/2.0/oai_dc.xsd'}, 'marc21': {'namespace': 'http://www.loc.gov/MARC21/slim', 'serializer': ('invenio_oaiserver.utils:dumps_etree', {'prefix': 'marc'}), 'schema': 'http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd'}}¶ Define the metadata formats available from a repository.
Every key represents a
metadataPrefix
and its value has a following structure.schema
- the location of an XML Schema describing the format;namespace
- the namespace of serialized document;serializer
- the importable string or tuple with the importable string and keyword arguments.
Note
If you are migrating an instance running older versions of Invenio<=2.1, you might want to copy settings from
'marc21'
key to'marcxml'
in order to ensure compatibility for all your OAI-PMH clients.
-
invenio_oaiserver.config.
OAISERVER_PAGE_SIZE
= 10¶ Define maximum length of list responses.
Request with verbs
ListRecords
,ListIdentifiers
, andListSets
are affected by this option.
-
invenio_oaiserver.config.
OAISERVER_QUERY_PARSER
= 'invenio_query_parser.parser:Main'¶ Define query parser for OIASet definition.
-
invenio_oaiserver.config.
OAISERVER_QUERY_WALKERS
= ['invenio_query_parser.walkers.pypeg_to_ast:PypegConverter']¶ List of query AST walkers.
-
invenio_oaiserver.config.
OAISERVER_RECORD_INDEX
= 'records'¶ Specify an Elastic index with records that should be exposed via OAI-PMH.
-
invenio_oaiserver.config.
OAISERVER_REGISTER_RECORD_SIGNALS
= True¶ Catch record/set insert/update/delete signals and update the _oai field.
-
invenio_oaiserver.config.
OAISERVER_REGISTER_SET_SIGNALS
= True¶ Catch set insert/update/delete signals and update the _oai record field.
-
invenio_oaiserver.config.
OAISERVER_RESUMPTION_TOKEN_EXPIRE_TIME
= 60¶ The expiration time of a resuption token in seconds.
Default: 60 seconds = 1 minute.
Note
Setting longer expiration time may have a negative impact on your Elastic search cluster as it might need to keep open cursors.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html