PyDelta documentation

This package contains utilities for loading data from the DELTA (Description Language for Taxonomy) format, and the equivalent binary files used by the IntKey software.

Installing PyDelta

The easiest way to install PyDelta is from PyPI. You can use easy_install pydelta or pip install pydelta at a command line if your system allows it. Otherwise, download and unzip the file, then run python setup.py install. The package on PyPI contains only the necessary code to use PyDelta. You can also download PyDelta from Sourceforge, which includes tests and examples, as well as this documentation.

Loading data

Data can be loaded in the following ways:

DELTA files

These are a set of text files, including a characters file and an items file. A specifications file and a character notes (cnotes) file are also often present. They are loaded using the pydelta module:

import pydelta
Example = pydelta.load_delta("/path/to/charsfile", "/path/to/itemsfile",
                             "/path/to/specsfile", "/path/to/cnotesfile")

IntKey binary files

These binary files consist of an characters file and an items file. They are often distributed together as a zip file, referenced by a file with the .ink extension. This is the case for many of the datasets listed on the DELTA website. These can be loaded using the intkey module:

from pydelta import intkey
grassgen = intkey.webload("http://delta-intkey.com/grass/webstart.ink")

If you want more control over the process, you may download the zip file (whose location can be found in the .ink file), unzip, and load the dataset like this:

grassgen = intkey.load("/path/to/ichars", "/path/to/iitems")

Using the information

Loading data from either format will give a Delta object. This consists primarily of a list of characters (.chars) and a list of items (.items). Following the DELTA standard, these both use 1-based indices, and trying to refer to item/character 0 will return None (negative indices, however, work as is normal for python, from the end of the list).

A character has the following properties: .feature, .type, .states (a 1-based list), .unit, .implicit (a default value for the character). Type may be integer (IN), real number (RN), unordered or ordered multistate (UM or OM), or text (TE), and may be checked against pydelta.CT as below. Unit only applies to (some) numeric characters, and states only apply to multistate characters.

for char in Example.chars:
    if char.type == pydelta.CT['RN']:
        print char.feature, char.unit

An item has the following properties: .name, .attributes. Attributes (character states for that item) are held in a python dictionary, so attempting to access an attribute which is not defined will raise a KeyError. The method .get_val_or_implicit(charnum) will return the state of a character for that item, or the implicit value defined for that character. If no value is returned, it returns None.

print len(Example.items)

for i, item in enumerate(Example.items, start=1):
    if 22 in item.attributes:
        print i, item.name, item.attributes[22], \
            item.get_val_or_implicit(39)

Storing for re-use

It is likely to be faster to store information using Python's pickle mechanism, rather than re-loading from the DELTA files each time, especially when using intkey.webload to download zip files. Since PyDelta cannot yet save to either DELTA or IntKey formats, this also allows you to store any changes you make to the data. To store:

from pydelta import intkey
import cPickle

grassgen = intkey.webload("http://delta-intkey.com/grass/webstart.ink")
cPickle.dump(grassgen, open("grassgen.pkl","wb"))

To load:

import pydelta
import cPickle

grassgen = cPickle.load(open("grassgen.pkl","rb"))