Use cases for datapkg

Use cases for DataPkg (or: reasons to use it in the first place)

These use cases are not necessarily all implemented but are a guide to what we are trying to do. The first two were the two original use cases at the start of the project (and were heavily inspired by debian).

1. Grabbing some data from an index

The steps involved:

$ datapkg index-add file:///....
$ datapkg update
$ datapkg search "military spending"

some-id Military Spending 1890-1914
some-id-2 Military Spending 1890-1914 (normalized)

$ datapkg install some-id
...
$ datapkg plot some-id

2. Get two different datasets and use them together

What data?

  • Normalize data * Cross country and then convert to standard (e.g. US$, GBP)

    • Exchange rates
    • Cost of living
    • Changes across time and then do real present value
  • [Plot two different data sources again each other.] * [Government expenditure in different sectors?]

Example code:

$ datapkg install pkg-a
$ datapkg install pkg-b
$ datapkg create merged
  # manual merge
  # e.g. PPP, GDP
$ datapkg register my-merged-package

Getting data v2

Revist basic discovery and usage of data from above.

  1. Install datapkg

  2. Search remote registry/repo for a package

  3. Download package on to local disk and unpack:

    $ datapkg get [url|name] [path]

If specifying name (using a Registry) then:

  • get metadata from registry
  • locate the distribution URL

Basic steps:

  • Discover at URL: targz/zip file, version controlled repo, URL page with links (ask user which one)
  • download the compressed distribution to temp dir (progress bar)
  • unpack it to destination path

Future: maybe need to build/compile data

  1. Explore package

Publisher user role

  1. Package a csv file
  2. Register the package to the remote repo.
  3. Upload the package distribution to the remote repo.

Table Of Contents

Previous topic

Datapkg Design

Next topic

Research

This Page