Using Datat

A datat is an intuitive data table for Python, inspired by R's data frames. The module can be installed from PyPI, and development takes place on Bitbucket. Detailed module documentation is also available.

Basic use

To create a datat from scratch, give it the names of the columns you want. Each row is a dictionary.

from datat import Datat
peopledatat = Datat(["Name", "Age", "Smoker?"])

fred = {"Name": "Fred B.", "Age": 31, "Smoker?": False}

anne = {"Name": "Anne M."}  # Any missing fields will be set to None

In the interactive interpreter, you can get a preview of the first ten rows (or all of them, if there are ten or fewer) just by referring to the datat: >>> peopledatat

You can refer to rows and fields by indexing:

anne = peopledatat[1]   # Anne's dictionary, now with all fields created
fred_age = peopledatat[0]["Age"]

Or loop over the records:

for mortal in peopledatat:

for name, age, issmoker in peopledatat.tuples: # Get tuples instead of dictionaries
    print("{0} is {1} years old".format(name, age))

Alternatively, you can work with columns. The datat's .columns property behaves somewhat like a standard Python dict (although it isn't):

peopledatat.columns["Name"] # --> ['Fred B.', 'Anne M.']
for colname in peopledatat.columns:
peopledatat.columns["Smoker?"] = [False, True]  # Overwrite an existing column
peopledatat.columns["Eye colour"] = None  # New, blank column

The .filter() method of a datat returns a new datat filtered by the conditions you specify. You can specify exact values with keywords, or put expressions in strings:

twentythreeyearolds = peopledatat.filter(Age=23)
freds = peopledatat.filter("Name.lower().startswith('fred')")

Named rows

To refer to rows by name, rather than in a sequence, use the DatatNamedRows class. This behaves like a Python dictionary, with the extra properties of a datat.

from datat import DatatNamedRows
metalsdatat = DatatNamedRows(["Melting point", "Alloy?", "Magnetic?"])
metalsdatat["Iron"] = {"Melting point": 1811, "Alloy?": False,
                        "Magnetic?": True}
metalsdatat["Bronze"] = {"Alloy?":True}
for rowname in metalsdatat:
    print(rowname, metalsdatat[rowname]["Alloy?"])

In Python 2.7/3.1 or later, or if you have the ordereddict module installed, rows will stay in the order they were added. Otherwise, the order is liable to be ignored.

CSV files

CSV (Comma Separated Value) is the standard format for transferring tables of data between programs, including spreadsheets. It's simple to load a datat from CSV, and to save back to it:

from datat import load_csv
leaftable = load_csv("leaf_measurements_211009.csv")

To load a table with row names, specify the name of the column containing row names. Each row name should be unique.

leaftable = load_csv("leaf_measurements_211009.csv", namescol="Sample code")

Converting to R

R is a powerful language for doing statistics, and producing plots of your data. Its 'data frames' inspired Datat for Python. So, it's possible to translate a datat into an R data frame using the rpy2 module (also on PyPI). You need to have that module installed to do this:

leaf_rdataframe = leaftable.translate_to_R()

This attempts to work out the type of each column (integer, float, boolean, text), and will default to text (factor) columns if there's a mixture of types within a column. Python's None will be translated to NA (which represents missing values in R).