Extending Datapkg

Datapkg has been designed to be easily extensible. At the present time you can write your own implementations of:

  • Commands - extend datapkg command line interface with new commands
  • Indexes - add new Indexes with which datapkg can communicate
  • Distribution - add new Distribution types (either for reading or writing or both)
  • (Package) Resource downloader - add support for downloading different types of resources
  • Uploader (via OFS) - upload to different storage backends


It is easy to add your own custom commands to the set of commands available from the datapkg command line interface.

To provide a new command named ‘mycommand’:

  1. Create a new command class inheriting from datapkg.cli.Command. This may be called anything you want. Assume it is called ‘MyNewCommand’ in package mynewpackage.command

  2. In the setup.py of your new python package (containing the new command) add to the datapkg.cli entry poing section and entry named ‘mycommand’:

    mycommand = mynewpackage.command:MyNewCommand

Command Base Class

class datapkg.cli.Command

Base command class that all datapkg Commands should inherit from.

An inheriting class provide a run method and can define the following class level attributes (documented below):

  • name
  • summary
  • usage
  • min_args
  • max_args
Maximum number of args to the command (not used if set to None)
Minimum number of args to the command (not used if set to None)
The name of the command as used on the command line and in help
run(options, args)

This is the method inheriting classes should override to implement their command functionality.

Inheriting classes should not call super to this method – they should just override it.

one line summary of this command (used in printing help)
A multiline detailed description of the command


To provide a new Index for datapkg to use (e.g. in datapkg search and datapkg download commands) you must:

  1. Create a new Index class inheriting from datapkg.index.IndexBase (see below)
  2. Add an entry point for your Index class in the [datapkg.index] section of your setup.py entry_points.
  • NB: the index will be available in datapkg commands (such as search) via the entry point name. E.g. if the entry point section looks like:

    mynewindex = mypackage.customindex:CustomIndex

    then the can be used in datapkg commands as follows:

    $ datapkg search mynewindex:// {search-tem}

Index Base class

class datapkg.index.base.IndexBase

Base class for Index objects, all Index implementations should implement the API defined here.

Get package with name name.
Check if package with name name is in Index.
Return an iterator over all items in the Index
Register package in the Index.
Return an iterator over search results corresponding to query.
Update package in the Index.


To provide a new Distribution (either for reading, writing or both) for datapkg to use you must:

  1. Create a new Distribution class inheriting from datapkg.distribution.DistributionBase (see below)
  2. Add an entry point for your Index class in the [datapkg.distribution] section of your setup.py entry_points.

Distribution Base class

class datapkg.distribution.DistributionBase(package=None)
classmethod load(path)

Load a L{Package} object from a path to a package distribution.

@return: the Distribution object.

Return a fileobj stream for material at path.
write(path, **kwargs)
Write this distribution to disk at path.

Resource Downloader

class datapkg.download.ResourceDownloaderBase

Base class for (package) resource downloaders which handle the downloading or accessing of (package) resources (i.e. files containing package data, APIs to package data etc).

To create a new resource downloader and have it used by datapkg:

1. Create a new class inheriting from datapkg.download.ResourceDownloaderBase

2. Add an entry point in the [datapkg.resource_downloader] entry_points section of your setup.py pointing to this class.

Many downloaders can be installed to handle different types of resources. Installed downloaders are called in turn with the first one to match being used. The order of calling is determined by order ot pkg_resources.iter_entry_points for the datapkg.resource_downloader entry point.

download(resource, dest_path)

Download the supplied resource.

Should be overriden (and not called) by inheriting classes.

This method should return True if and only if the class can handle (and therefore has handled) the downloaded resource and should return False otherwise (thereby allowing subsequent downloaders to be tried).


datapkg utilizes the pluggable blobstore library OFS (http://bitbucket.org/okfn/ofs).

To add a new storage backend just extend OFS and this new backend will be automatically available to datapkg.

Table Of Contents

Previous topic


Next topic


This Page