Usage

Invenio module for OAI-PMH metadata harvesting between repositories.

Harvesting is simple

youroverlay oaiharvester harvest -u http://export.arxiv.org/oai2 \
    -i oai:arXiv.org:1507.07286 > my_record.xml

This will harvest the repository for a specific record and print the records to stdout - which in this case will save it to a file called my_record.xml.

If you want to have your harvested records saved in a directory automatically, its easy:

youroverlay oaiharvester harvest -u http://export.arxiv.org/oai2 \
    -i oai:arXiv.org:1507.07286 -d /tmp

Note the directory -d parameter that specifies a directory to save harvested XML files.

Integration with your application

If you want to integrate invenio-oaiharvester into your application, you could hook into the signals sent by the harvester after a completed harvest.

See invenio_oaiharvester.signals:oaiharvest_finished.

Check also the defined Celery tasks under invenio_oaiharvester.tasks.

Managing OAI-PMH sources

If you want to store configuration for an OAI repository, you can use the SQLAlchemy model invenio_oaiharvester.models:OAIHarvestConfig.

This is useful if you regularly need to query a server.

Here you can add information about the server URL, metadataPrefix to use etc. This information is also available when scheduling and running tasks:

youroverlay oaiharvester get -n somerepo -i oai:example.org:1234

Here we are using the -n, –name parameter to specify which configured OAI-PMH source to query, using the name property.