API Docs

Invenio-OAIHarvester API to harvest items from OAI-PMH servers.

If you need to schedule or run harvests from inside of Python, you can use our API:

from invenio_oaiharvester.api import get_records

request, records = get_records(identifiers=["oai:arXiv.org:1207.7214"],
                               url="http://export.arxiv.org/oai2")
for record in records:
    print rec.raw
invenio_oaiharvester.api.get_info_by_oai_name(name)[source]

Get basic OAI request data from the OAIHarvestConfig model.

Parameters:name – name of the source (OAIHarvestConfig.name)
Returns:(url, metadataprefix, lastrun as YYYY-MM-DD, setspecs)
invenio_oaiharvester.api.get_records(identifiers, metadata_prefix=None, url=None, name=None)[source]

Harvest specific records from an OAI repo via OAI-PMH identifiers.

Parameters:
  • metadata_prefix – The prefix for the metadata return (defaults to ‘oai_dc’).
  • identifiers – list of unique identifiers for records to be harvested.
  • url – The The url to be used to create the endpoint.
  • name – The name of the OAIHarvestConfig to use instead of passing specific parameters.
Returns:

request object, list of harvested records

invenio_oaiharvester.api.list_records(metadata_prefix=None, from_date=None, until_date=None, url=None, name=None, setspecs=None)[source]

Harvest multiple records from an OAI repo.

Parameters:
  • metadata_prefix – The prefix for the metadata return (defaults to ‘oai_dc’).
  • from_date – The lower bound date for the harvesting (optional).
  • until_date – The upper bound date for the harvesting (optional).
  • url – The The url to be used to create the endpoint.
  • name – The name of the OAIHarvestConfig to use instead of passing specific parameters.
  • setspecs – The ‘set’ criteria for the harvesting (optional).
Returns:

request object, list of harvested records

Models

OAI harvest database models.

class invenio_oaiharvester.models.OAIHarvestConfig(**kwargs)[source]

Represents a OAIHarvestConfig record.

baseurl
comment
id
lastrun
metadataprefix
name
save()[source]

Save object to persistent storage.

setspecs
update_lastrun(new_date=None)[source]

Update the ‘lastrun’ attribute of object to now.

Configuration

OAI harvest config.

invenio_oaiharvester.config.OAIHARVESTER_DEFAULT_NAMESPACE_MAP = {'OAI-PMH': 'http://www.openarchives.org/OAI/2.0/'}

The default namespace used when handling OAI-PMH results.

invenio_oaiharvester.config.OAIHARVESTER_WORKDIR = None

Path to directory for oaiharvester related files, default: instance_path.

Tasks

Celery tasks used by Invenio-OAIHarvester.

(task)invenio_oaiharvester.tasks.get_specific_records(identifiers, metadata_prefix=None, url=None, name=None, signals=True, **kwargs)[source]

Harvest specific records from an OAI repo via OAI-PMH identifiers.

Parameters:
  • metadata_prefix – The prefix for the metadata return (e.g. ‘oai_dc’)
  • identifiers – list of unique identifiers for records to be harvested.
  • url – The The url to be used to create the endpoint.
  • name – The name of the OAIHarvestConfig to use instead of passing specific parameters.
  • signals – If signals should be emitted about results.
(task)invenio_oaiharvester.tasks.list_records_from_dates(metadata_prefix=None, from_date=None, until_date=None, url=None, name=None, setspecs=None, signals=True, **kwargs)[source]

Harvest multiple records from an OAI repo.

Parameters:
  • metadata_prefix – The prefix for the metadata return (e.g. ‘oai_dc’)
  • from_date – The lower bound date for the harvesting (optional).
  • until_date – The upper bound date for the harvesting (optional).
  • url – The The url to be used to create the endpoint.
  • name – The name of the OAIHarvestConfig to use instead of passing specific parameters.
  • setspecs – The ‘set’ criteria for the harvesting (optional).
  • signals – If signals should be emitted about results.

Exceptions

OAI harvester errors.

exception invenio_oaiharvester.errors.IdentifiersOrDates[source]

Identifiers cannot be used in combination with dates.

exception invenio_oaiharvester.errors.InvenioOAIHarvesterError[source]

Base exception for invenio-oaiharvester.

exception invenio_oaiharvester.errors.InvenioOAIRequestError[source]

Error with the OAI-PMH request.

exception invenio_oaiharvester.errors.NameOrUrlMissing[source]

Name or url for harvesting missing.

exception invenio_oaiharvester.errors.WrongDateCombination[source]

‘Until’ date is larger that ‘from’ date.