Python API Documentation

Reference Information from the UK Government

Calendar values and time specifications

class ukgov.reference.time.Year(graph=None, load=False)[source]

Bases: ordf.vocab.owl.Class

This is an OWL Factory class. See the documentation for ordf.vocab.owl.Class for more information on precisely what this means.

This particular factory class has a :meth:get method that will fetch year data from the UK government namespace and return an instance of the concrete indivudual _Year.

>>> factory = Year(load=True)
>>> year2000 = factory.get("2000")
>>> print year2000.start, year2000.end
2000-01-01 00:00:00 2001-01-01 00:00:00
>>>
Parameters:
  • graph – The graph that the OWL specification of the class is to live in, and the default graph for individuals created with the get() method.
Parm load:

If True, the get method will fetch a year specification from the Internet if it doesn’t already exist in the local graph or in the cache.

class ukgov.reference.time._Year(identifier=None, graph=None, factoryGraph=None, **kw)[source]

Bases: ordf.vocab.owl.Individual

An individual representing a year.

Attribute start:
 an instance of datetime.datetime for the start of the year
Attribute end:an instance of datetime.datetime for the end of the year.
class ukgov.reference.time.GovYear(graph=None, load=False)[source]

Bases: ukgov.reference.time.Year

Same as with Year except for the fiscal year, starting on the first of April.

>>> factory = GovYear(load=True)
>>> year2000 = factory.get("2000-2001")
>>> print year2000.start, year2000.end
2000-04-01 00:00:00 2001-04-01 00:00:00
>>>

Country and Regional Analysis

CRAReader

class ukgov.treasury.cra.loader.CRAReader(cache_dir, dataset, datafile, year_col_start=10)[source]

A specialised CSV reader for the CRA that takes a datapkg dataset and datafile and yields dictionaries for each row with the values cleaned. Once initialised, this class is an iterable.

At the time of writing this requires the current version of datapkg from the mercurial repository. The package metadata is included in this python module. You can see the available datasets by doing:

% datapkg list egg://ukgov_treasury_cra
cra2009 -- Country and Regional Analysis 2009 - CSV

Example usage of this class:

from pprint import pprint
reader = CRAReader(cache_dir, "cra2009", "cra_2009_db.csv")
for row in reader:
    pprint(row)
    ...

output:

{'body_type': u'CG',
 'cap_or_cur': u'CUR',
 'cofog_parts': [u'04', u'04.1', u'04.1.2'],
 'dept_code': u'Dept032',
 'dept_name': u'Department for Work and Pensions',
 'expenditures': [(u'2003-2004', 0.0),
                  (u'2004-2005', 0.0),
                  (u'2005-2006', 0.0),
                  (u'2006-2007', 0.0),
                  (u'2007-2008', 0.0),
                  (u'2008-2009', 12100000.0),
                  (u'2009-2010', 12100000.0),
                  (u'2010-2011', 12100000.0)],
  'pog_code': u'P37 S121211',
  'pog_name': u'ADMIN COSTS OF MEASURES TO HELP UNEMPL PEOPLE MOVE FROM WELFARE T...',
  'region': u'SCOTLAND'}
Parameters:
  • cache_dir – Cache directory where the downloaded data will live. A datapkg filesystem index is created here
  • dataset – The name of the dataset to be downloaded, e.g. “cra2009”
  • datafile – The filename of the downloaded data within the dataset, e.g. “cra_2009_db.csv”
  • year_col_start – The column in the datafile where the actual expenditures begin, after the metadata columns.
getdata(source, cached, datafile)[source]

Returns an open file handle pointing at the data file that we want to read. This copies perhaps too much from datapkg’s “install” command implementation and should probably be part of the datapkg API.

Parameters:
  • source – source index and dataset specification, e.g. “egg://ukgov_treasury_cra/cra2009”
  • cached – destination to cache the data, also index and dataset specification, e.g. “file:///tmp/ukgov_treasury_cra/2009”
  • datafile – filename within the dataset to open. e.g. “cra_2009_db.csv”
header(header)[source]

Extracts fiscal years from the header columns. Also sets an instance property “year_col_start” that is used by the cleandata() meth to find the column where actual expenditure data is kept.

cleandata(row)[source]

Cleans and normalises an input row, turns it into a dictionary. This uses the various get_*() methods to extract individual parts for easy subclassing. This method starts out with an empty collector dictionary and looks for methods with names that begin with get_ and calls them with the row as an argument. The return value of these functions is expected to be a dictionary that is used to update the collector.

Returns:A dictionary representing the data in the row.
make_cofog_mapper()[source]

Create a COFOG mapper used in cleaning the data. This is a method to provide for easy subclassing where a different mapper is required.

get_department(row)[source]

Extracts the department code and name from the first two columns of the row.

Returns:A dictionary with keys “dept_code” and “dept_name”
get_cofog_parts(row)[source]

Extracts the COFOG from the function and subfunction fields, columns 2 and 3 in the 2009 data. Passes them through the CofogMapper.

Returns:A dictionary with the key “cofog_parts” and value a list ordered from least to most specific classification.
get_pog(row)[source]

Extracts the Programme Object Group from the columns 4 and 5.

Returns:A dictionary with keys “pog_code” and “pog_name”
get_cap_or_cur(row)[source]

Extracts the flag indicating capital or current expenditure from column 7.

Returns:A dictionary with the key “cap_or_cur” and the value either “CAP” or “CUR” as applicable.
get_body_type(row)[source]

Extracts the type of reporting entity from column 8.

Returns:A dictionary with the key “body_type” and the value can be one of:
  • “CG” – Central Government
  • “LA” – Local Authority
  • “PC” – Public Corporation
get_region(row)[source]

Extract the region from column 9 of the row.

Returns:A dictionary with the key “region”.
get_expenditures(row)[source]

Extract the yearly expenditures from the row. This makes use of the year_col_start class attribute to determine at which column to begin.

Returns:A dictionary with the key “expenditures and value a list of two-tuples representing fiscal year and the reported expenditure value

CRA2009

class ukgov.treasury.cra.loader.CRA2009[source]

Bases: ordf.command.Command

This class implements the cra2009 command that loads the CSV file from the treasury and transforms it to RDF, placing the results in a triplestore managed by ordf.handler.rdf. See the documentation in the command-line utilities section of the manual for usage instructions.

edition

This class attribute may be changed by subclasses implementing a command for other editions of the CRA data. The value here is the string “2009”. This is used both to derive the dataset and datafile that will be used as well as to set the RDF namespace for the generated data. If the 2010 data is made available in a file called ‘cra_2010_db.csv’ and has exactly the same form as the 2009 data, then creating a command for manipulating the new edition should be as simple as:

class CRA2010(CRA2009):
    edition = "2010

and adding the appropriate entry points and ‘datapkg_sources’ to the ‘setup.py’ file.

store

An RDFLib compatible store, gleaned from the ordf.handler.Handler that will be used for saving the data.

entries

This property is created by create_sdmx_dataset() and contains the python object representing the sdmx:DataSet.

components

This property is also created by create_sdmx_dataset(). It is a list of the types for each dimension that should be included when constructing the URI for timeseries.

load_schema()[source]

Load the CRA schema and supporting information into the RDF datastore

create_sdmx_dataset()[source]

Define the structure for the entries dataset, this is at the most granular level reflecting what data is directly in the cra CSV file. The result of calling this function is to create the following graphs in the RDF store:

  • a sdmx:DataSet
  • a sdmx:DataStructureDefinition describing the dimensions and attributes of the data in the daset

This method sets the entries and components properties on this class.

create_sdmx_timeseries(row)[source]

Given a row in the form of a dictionary from CRAReader, create a graph containing an sdmx:TimeSeries in the store.

CofogMapper

class ukgov.treasury.cra.loader.CofogMapper(mappings)[source]

Constructs a COFOG mapper from a mappings object (which is usually loaded from a JSON file).

In the published data, the “function” and “subfunction” columns are used inconsistently. This is partly because some departments continue to use a previous coding system, and partly because only two columns have been allowed for the three levels of the COFOG hierarchy.

This class uses a mapping provided by William Waites to work out the correct COFOG code, given the published data.

Parameters:
  • mappings – a list of triples. In each triple, the first element is the good code, and the second and third elements give the published values. If the first element (the good code) contains non-numerical suffix, it will be removed.
fix(function, subfunction)[source]

Looks up the fixed COFOG code given the published values.

Returns a list giving all available COFOG levels, e.g. [u‘01’, u‘01.1’, u‘01.1.1’]

Returns an empty list if no COFOG mapping has been defined.

Parameters:
  • function – function as expressed by HMT
  • subfunction – subfunction as expressed by HMT