petl is a tentative Python package for extracting, transforming and loading tables of data.
For examples of petl in use, see the case studies below:
The tables below gives an overview of the main functions in the petl module.
See also the alphabetic Index of all functions in the package.
Functions for extracting tables from files and databases:
Extract | ||
---|---|---|
fromcsv() | fromtsv() | frompickle() |
fromsqlite3() | fromdb() | fromtext() |
fromxml() | fromjson() | fromdicts() |
Functions for transforming tables:
Functions for loading (writing) data to files and databases:
Load | ||
---|---|---|
tocsv() | appendcsv() | totsv() |
appendtsv() | topickle() | appendpickle() |
tosqlite3() | appendsqlite3() | todb() |
appenddb() | totext() | appendtext() |
tojson() |
Other utility functions:
This module is available from the Python Package Index. On Linux distributions you should be able to do easy_install petl or pip install petl. On Windows or Mac you can download manually, extract and run python setup.py install.
This package has been written with no dependencies other than the Python core modules, for ease of installation and maintenance. However, there are many third party packages which could usefuly be used with petl, e.g., providing access to data from Excel or other file types. Some extensions with these additional dependencies are provided by the petlx package, a companion package to petl.
This package defines the following convention for objects acting as containers of tabular data and supporting row-oriented iteration over the data.
A row container (also referred to here informally as a table) is any object which satisfies the following:
A row iterator is an iterator which satisfies the following:
So, for example, the list of lists shown below is a row container:
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
Note that, under this convention, an object returned by the csv.reader() function from the standard Python csv module is a row iterator and not a row container, because it can only be iterated over once, e.g.:
>>> from StringIO import StringIO
>>> import csv
>>> csvdata = """foo,bar
... a,1
... b,2
... """
>>> rowiterator = csv.reader(StringIO(csvdata))
>>> for row in rowiterator:
... print row
...
['foo', 'bar']
['a', '1']
['b', '2']
>>> for row in rowiterator:
... print row
...
>>> # can only iterate once
However, it is straightforward to define functions that support the above convention for row containers and provide access to data from CSV or other types of file or data source, see e.g. the fromcsv() function in this package.
The main reason for requiring that row containers support independent row iterators (point 3) is that data from a table may need to be iterated over several times within the same program or interactive session. E.g., when using petl in an interactive session to build up a sequence of data transformation steps, the user might want to examine outputs from several intermediate steps, before all of the steps are defined and the transformation is executed in full.
Note that this convention does not place any restrictions on the lengths of header and data rows. A table may return a header row and/or data rows of varying lengths.
Note that this package makes extensive use of lazy evaluation and iterators. This means, generally, that a transformation will not actually be executed until data is requested.
E.g., given the following data in a file at ‘example1.csv’ in the current working directory:
foo,bar,baz
a,1,3.4
b,2,7.4
c,6,2.2
d,9,8.1
...the following code does not actually read the file, nor does it load any of its contents into memory:
>>> from petl import *
>>> table1 = fromcsv('example1.csv')
Rather, table1 is a row container object, which can be iterated over.
Similarly, if one or more transformation functions are applied, e.g.::
>>> table2 = convert(table1, 'foo', 'upper')
>>> table3 = convert(table2, 'bar', int)
>>> table4 = convert(table3, 'baz', float)
>>> table5 = addfield(table4, 'quux', expr('{bar} * {baz}'))
...no actual transformation work will be done, until data are requested from table5 or any of the other row containers returned by the intermediate steps.
So in effect, a 5 step transformation pipeline has been set up, and rows will pass through the pipeline on demand, as they are pulled from the end of the pipeline via iteration.
A call to a function like look(), or any of the functions which write data to a file or database (e.g., tocsv(), totext(), tosqlite3(), todb()), will pull data through the pipeline and cause all of the transformation steps to be executed on the requested rows, e.g.:
>>> look(table5)
+-------+-------+-------+--------------------+
| 'foo' | 'bar' | 'baz' | 'quux' |
+=======+=======+=======+====================+
| 'A' | 1 | 3.4 | 3.4 |
+-------+-------+-------+--------------------+
| 'B' | 2 | 7.4 | 14.8 |
+-------+-------+-------+--------------------+
| 'C' | 6 | 2.2 | 13.200000000000001 |
+-------+-------+-------+--------------------+
| 'D' | 9 | 8.1 | 72.89999999999999 |
+-------+-------+-------+--------------------+
...although note that look() will by default only request the first 10 rows, and so at most only 10 rows will be processed. Calling look() to inspect the first few rows of a table is often an efficient way to examine the output of a transformation pipeline, without having to execute the transformation over all of the input data.
This package tries to make efficient use of memory by using iterators and lazy evaluation where possible. However, some transformations cannot be done without building data structures, either in memory or on disk.
An example is the sort() function, which will either sort a table entirely in memory, or will sort the table in memory in chunks, writing chunks to disk and performing a final merge sort on the chunks. Which strategy is used will depend on the arguments passed into the sort() function when it is called.
In either case, the sorting can take some time, and if the sorted data will be used more than once, it is obviously undesirable to throw away the sorted data and start again from scratch each time. It is better to cache (a.k.a., memoize) the sorted data, if possible, so it can be re-used. However, if a table upstream of the sort is mutable, there needs to be some way of discovering whether data have been changed since the last sort was performed, and hence whether the cached data are still fresh or not.
There are also cases where, even though data are generated in a purely iterative fashion, some programs may still want to cache some or all of the data. E.g., where data are calculated dynamically and are relatively expensive to compute, or require scanning many rows, or where data are being retrieved via a network and there is latency. Again, there needs to be some way of finding out whether cached data are fresh or not.
To support caching, it is recommended (but not required) that row container objects also implement the cachetag method. This method should return an integer which changes whenever the table’s fields or data changes (i.e., a row iterator would yield a different sequence of items). All petl row container objects implement this method, where applicable.
Note in particular that the functions fromcsv(), frompickle() and fromsqlite3() return tables that implement the cachetag method, and that the implementation of cachetag in these cases depends on a checksum function operating on the underlying data file. By default, statsum() is used as the checksum function, which is cheap to compute but crude because it relies only on file name, size and time of modification, and on some systems this will not reveal changes within the same second that preserve file size. If you need a finer level of granularity, use either adler32sum() or crc32sum() instead. These can be passed in as arguments to fromcsv(), frompickle() and fromsqlite3(), or can be set globally, e.g.:
>>> import petl.io
>>> petl.io.defaultsumfun = petl.io.adler32sum
Care must be taken to ensure correct implementation of the cachetag method where a table is generating data dynamically from another table or tables. In these cases, the state of the upstream tables must be considered when generating a cachetag value. One strategy is to construct cachetag values by hashing the internal configuration attributes of a table, along with the cachetag values from any tables immediately upstream.
In some cases, it may be difficult to determine whether data have changed, e.g., where data are being retrieved from a database. In these cases it is suggested that table objects use a user-configurable time to live (TTL) to generate cachetag values. E.g., where a database table is updated once a day at most, a TTL of a few hours would enable data to be cached during an interactive session, which might improve usability for someone exploring the data or developing a transformation script.
Wrapper for the standard csv.reader() function. Returns a table providing access to the data in the given delimited file. E.g.:
>>> import csv
>>> # set up a CSV file to demonstrate with
... with open('test.csv', 'wb') as f:
... writer = csv.writer(f)
... writer.writerow(['foo', 'bar'])
... writer.writerow(['a', 1])
... writer.writerow(['b', 2])
... writer.writerow(['c', 2])
...
>>> # now demonstrate the use of petl.fromcsv
... from petl import fromcsv, look
>>> testcsv = fromcsv('test.csv')
>>> look(testcsv)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '2' |
+-------+-------+
The filename argument is the path of the delimited file, all other keyword arguments are passed to csv.reader(). So, e.g., to override the delimiter from the default CSV dialect, provide the delimiter keyword argument.
Note that all data values are strings, and any intended numeric values will need to be converted, see also convert().
The returned table object implements the cachetag() method. If the checksumfun argument is not given, the default checksum function (whatever petl.io.defaultsumfun is currently set to) will be used to calculate cachetag values.
Convenience function, as fromcsv() but with different default dialect (tab delimited).
New in version 0.9.
Returns a table providing access to the data pickled in the given file. The rows in the table should have been pickled to the file one at a time. E.g.:
>>> import pickle
>>> # set up a file to demonstrate with
... with open('test.dat', 'wb') as f:
... pickle.dump(['foo', 'bar'], f)
... pickle.dump(['a', 1], f)
... pickle.dump(['b', 2], f)
... pickle.dump(['c', 2.5], f)
...
>>> # now demonstrate the use of petl.frompickle
... from petl import frompickle, look
>>> testdat = frompickle('test.dat')
>>> look(testdat)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2.5 |
+-------+-------+
The returned table object implements the cachetag() method. If the checksumfun argument is not given, the default checksum function (whatever petl.io.defaultsumfun is currently set to) will be used to calculate cachetag values.
Provides access to data from an sqlite3 database file via a given query. E.g.:
>>> import sqlite3
>>> from petl import look, fromsqlite3
>>> # set up a database to demonstrate with
>>> data = [['a', 1],
... ['b', 2],
... ['c', 2.0]]
>>> connection = sqlite3.connect('test.db')
>>> c = connection.cursor()
>>> c.execute('create table foobar (foo, bar)')
<sqlite3.Cursor object at 0x2240b90>
>>> for row in data:
... c.execute('insert into foobar values (?, ?)', row)
...
<sqlite3.Cursor object at 0x2240b90>
<sqlite3.Cursor object at 0x2240b90>
<sqlite3.Cursor object at 0x2240b90>
>>> connection.commit()
>>> c.close()
>>>
>>> # now demonstrate the petl.fromsqlite3 function
... foobar = fromsqlite3('test.db', 'select * from foobar')
>>> look(foobar)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| u'a' | 1 |
+-------+-------+
| u'b' | 2 |
+-------+-------+
| u'c' | 2.0 |
+-------+-------+
The returned table object implements the cachetag() method. If the checksumfun argument is not given, the default checksum function (whatever petl.io.defaultsumfun is currently set to) will be used to calculate cachetag values.
Changed in version 0.10.2.
Either a database file name or a connection object can be given as the first argument. (Note that cachetag() is only implemented if a file name is given.)
Provides access to data from any DB-API 2.0 connection via a given query. E.g., using sqlite3:
>>> import sqlite3
>>> from petl import look, fromdb
>>> connection = sqlite3.connect('test.db')
>>> table = fromdb(connection, 'select * from foobar')
>>> look(table)
E.g., using psycopg2 (assuming you’ve installed it first):
>>> import psycopg2
>>> from petl import look, fromdb
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> table = fromdb(connection, 'select * from test')
>>> look(table)
E.g., using MySQLdb (assuming you’ve installed it first):
>>> import MySQLdb
>>> from petl import look, fromdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> table = fromdb(connection, 'select * from test')
>>> look(table)
The returned table object does not implement the cachetag() method.
Changed in version 0.10.2.
The first argument may also be a function that creates a cursor. E.g.:
>>> import psycopg2
>>> from petl import look, fromdb
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> mkcursor = lambda: connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
>>> table = fromdb(mkcursor, 'select * from test')
>>> look(table)
N.B., each call to the function should return a new cursor.
Construct a table from lines in the given text file. E.g.:
>>> # example data
... with open('test.txt', 'w') as f:
... f.write('a\t1\n')
... f.write('b\t2\n')
... f.write('c\t3\n')
...
>>> from petl import fromtext, look
>>> table1 = fromtext('test.txt')
>>> look(table1)
+--------------+
| 'lines' |
+==============+
| 'a\t1' |
+--------------+
| 'b\t2' |
+--------------+
| 'c\t3' |
+--------------+
The fromtext() function provides a starting point for custom handling of text files. E.g., using capture():
>>> from petl import capture
>>> table2 = capture(table1, 'lines', '(.*)\\t(.*)$', ['foo', 'bar'])
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '3' |
+-------+-------+
Changed in version 0.4.
The strip() function is called on each line, which by default will remove leading and trailing whitespace, including the end-of-line character - use the strip keyword argument to specify alternative characters to strip.
Access data in an XML file. E.g.:
>>> from petl import fromxml, look
>>> data = """<table>
... <tr>
... <td>foo</td><td>bar</td>
... </tr>
... <tr>
... <td>a</td><td>1</td>
... </tr>
... <tr>
... <td>b</td><td>2</td>
... </tr>
... <tr>
... <td>c</td><td>2</td>
... </tr>
... </table>"""
>>> with open('example1.xml', 'w') as f:
... f.write(data)
... f.close()
...
>>> table1 = fromxml('example1.xml', 'tr', 'td')
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '2' |
+-------+-------+
If the data values are stored in an attribute, provide the attribute name as an extra positional argument, e.g.:
>>> data = """<table>
... <tr>
... <td v='foo'/><td v='bar'/>
... </tr>
... <tr>
... <td v='a'/><td v='1'/>
... </tr>
... <tr>
... <td v='b'/><td v='2'/>
... </tr>
... <tr>
... <td v='c'/><td v='2'/>
... </tr>
... </table>"""
>>> with open('example2.xml', 'w') as f:
... f.write(data)
... f.close()
...
>>> table2 = fromxml('example2.xml', 'tr', 'td', 'v')
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '2' |
+-------+-------+
Data values can also be extracted by providing a mapping of field names to element paths, e.g.:
>>> data = """<table>
... <row>
... <foo>a</foo><baz><bar v='1'/><bar v='3'/></baz>
... </row>
... <row>
... <foo>b</foo><baz><bar v='2'/></baz>
... </row>
... <row>
... <foo>c</foo><baz><bar v='2'/></baz>
... </row>
... </table>"""
>>> with open('example3.xml', 'w') as f:
... f.write(data)
... f.close()
...
>>> table3 = fromxml('example3.xml', 'row', {'foo': 'foo', 'bar': ('baz/bar', 'v')})
>>> look(table3)
+-------+------------+
| 'foo' | 'bar' |
+=======+============+
| 'a' | ('1', '3') |
+-------+------------+
| 'b' | '2' |
+-------+------------+
| 'c' | '2' |
+-------+------------+
Note that the implementation is currently not streaming, i.e., the whole document is loaded into memory.
New in version 0.4.
Changed in version 0.6: If multiple elements match a given field, all values are reported as a tuple.
Extract data from a JSON file. The file must contain a JSON array as the top level object, and each member of the array will be treated as a row of data. E.g.:
>>> from petl import fromjson, look
>>> data = '[{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}]'
>>> with open('example1.json', 'w') as f:
... f.write(data)
...
>>> table1 = fromjson('example1.json')
>>> look(table1)
+--------+--------+
| u'foo' | u'bar' |
+========+========+
| u'a' | 1 |
+--------+--------+
| u'b' | 2 |
+--------+--------+
| u'c' | 2 |
+--------+--------+
If your JSON file does not fit this structure, you will need to parse it via json.load() and select the array to treat as the data, see also fromdicts().
New in version 0.5.
View a sequence of Python dict as a table. E.g.:
>>> from petl import fromdicts, look
>>> dicts = [{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}]
>>> table = fromdicts(dicts)
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
See also fromjson().
New in version 0.5.
Replace one or more fields in the table’s header row. E.g.:
>>> from petl import look, rename
>>> look(table1)
+-------+-------+
| 'sex' | 'age' |
+=======+=======+
| 'M' | 12 |
+-------+-------+
| 'F' | 34 |
+-------+-------+
| '-' | 56 |
+-------+-------+
>>> # rename a single field
... table2 = rename(table1, 'sex', 'gender')
>>> look(table2)
+----------+-------+
| 'gender' | 'age' |
+==========+=======+
| 'M' | 12 |
+----------+-------+
| 'F' | 34 |
+----------+-------+
| '-' | 56 |
+----------+-------+
>>> # rename multiple fields by passing a dictionary as the second argument
... table3 = rename(table1, {'sex': 'gender', 'age': 'age_years'})
>>> look(table3)
+----------+-------------+
| 'gender' | 'age_years' |
+==========+=============+
| 'M' | 12 |
+----------+-------------+
| 'F' | 34 |
+----------+-------------+
| '-' | 56 |
+----------+-------------+
>>> # the returned table object can also be used to modify the field mapping using the suffix notation
... table4 = rename(table1)
>>> table4['sex'] = 'gender'
>>> table4['age'] = 'age_years'
>>> look(table4)
+----------+-------------+
| 'gender' | 'age_years' |
+==========+=============+
| 'M' | 12 |
+----------+-------------+
| 'F' | 34 |
+----------+-------------+
| '-' | 56 |
+----------+-------------+
Changed in version 0.4.
Function signature changed to support the simple 2 argument form when renaming a single field.
Override fields in the given table. E.g.:
>>> from petl import setheader, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
>>> table2 = setheader(table1, ['foofoo', 'barbar'])
>>> look(table2)
+----------+----------+
| 'foofoo' | 'barbar' |
+==========+==========+
| 'a' | 1 |
+----------+----------+
| 'b' | 2 |
+----------+----------+
Extend fields in the given table. E.g.:
>>> from petl import extendheader, look
>>> look(table1)
+-------+---+-------+
| 'foo' | | |
+=======+===+=======+
| 'a' | 1 | True |
+-------+---+-------+
| 'b' | 2 | False |
+-------+---+-------+
>>> table2 = extendheader(table1, ['bar', 'baz'])
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 1 | True |
+-------+-------+-------+
| 'b' | 2 | False |
+-------+-------+-------+
Push rows down and prepend a header row. E.g.:
>>> from petl import pushheader, look
>>> look(table1)
+-----+---+
| 'a' | 1 |
+=====+===+
| 'b' | 2 |
+-----+---+
>>> table2 = pushheader(table1, ['foo', 'bar'])
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
Useful, e.g., where data are from a CSV file that has not included a header row.
Skip n rows (including the header row). E.g.:
>>> from petl import skip, look
>>> look(table1)
+--------+-------+-------+
| '#aaa' | 'bbb' | 'ccc' |
+========+=======+=======+
| '#mmm' | | |
+--------+-------+-------+
| 'foo' | 'bar' | |
+--------+-------+-------+
| 'a' | 1 | |
+--------+-------+-------+
| 'b' | 2 | |
+--------+-------+-------+
>>> table2 = skip(table1, 2)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
Skip any row where the first value is a string and starts with prefix. E.g.:
>>> from petl import skipcomments, look
>>> look(table1)
+---------+-------+-------+
| '##aaa' | 'bbb' | 'ccc' |
+=========+=======+=======+
| '##mmm' | | |
+---------+-------+-------+
| '#foo' | 'bar' | |
+---------+-------+-------+
| '##nnn' | 1 | |
+---------+-------+-------+
| 'a' | 1 | |
+---------+-------+-------+
| 'b' | 2 | |
+---------+-------+-------+
>>> table2 = skipcomments(table1, '##')
>>> look(table2)
+--------+-------+
| '#foo' | 'bar' |
+========+=======+
| 'a' | 1 |
+--------+-------+
| 'b' | 2 |
+--------+-------+
New in version 0.4.
Choose a subsequence of data rows. E.g.:
>>> from petl import rowslice, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 5 |
+-------+-------+
| 'd' | 7 |
+-------+-------+
| 'f' | 42 |
+-------+-------+
>>> table2 = rowslice(table1, 2)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
>>> table3 = rowslice(table1, 1, 4)
>>> look(table3)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'b' | 2 |
+-------+-------+
| 'c' | 5 |
+-------+-------+
| 'd' | 7 |
+-------+-------+
>>> table4 = rowslice(table1, 0, 5, 2)
>>> look(table4)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'c' | 5 |
+-------+-------+
| 'f' | 42 |
+-------+-------+
Changed in version 0.3.
Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().
Choose the first n data rows. E.g.:
>>> from petl import head, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 5 |
+-------+-------+
| 'd' | 7 |
+-------+-------+
| 'f' | 42 |
+-------+-------+
| 'f' | 3 |
+-------+-------+
| 'h' | 90 |
+-------+-------+
>>> table2 = head(table1, 4)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 5 |
+-------+-------+
| 'd' | 7 |
+-------+-------+
Syntactic sugar, equivalent to rowslice(table, n).
Choose the last n data rows. E.g.:
>>> from petl import tail, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 5 |
+-------+-------+
| 'd' | 7 |
+-------+-------+
| 'f' | 42 |
+-------+-------+
| 'f' | 3 |
+-------+-------+
| 'h' | 90 |
+-------+-------+
| 'k' | 12 |
+-------+-------+
| 'l' | 77 |
+-------+-------+
| 'q' | 2 |
+-------+-------+
>>> table2 = tail(table1, 4)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'h' | 90 |
+-------+-------+
| 'k' | 12 |
+-------+-------+
| 'l' | 77 |
+-------+-------+
| 'q' | 2 |
+-------+-------+
Choose and/or re-order columns. E.g.:
>>> from petl import look, cut
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2.7 |
+-------+-------+-------+
| 'B' | 2 | 3.4 |
+-------+-------+-------+
| 'B' | 3 | 7.8 |
+-------+-------+-------+
| 'D' | 42 | 9.0 |
+-------+-------+-------+
| 'E' | 12 | |
+-------+-------+-------+
>>> table2 = cut(table1, 'foo', 'baz')
>>> look(table2)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'A' | 2.7 |
+-------+-------+
| 'B' | 3.4 |
+-------+-------+
| 'B' | 7.8 |
+-------+-------+
| 'D' | 9.0 |
+-------+-------+
| 'E' | None |
+-------+-------+
>>> # fields can also be specified by index, starting from zero
... table3 = cut(table1, 0, 2)
>>> look(table3)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'A' | 2.7 |
+-------+-------+
| 'B' | 3.4 |
+-------+-------+
| 'B' | 7.8 |
+-------+-------+
| 'D' | 9.0 |
+-------+-------+
| 'E' | None |
+-------+-------+
>>> # field names and indices can be mixed
... table4 = cut(table1, 'bar', 0)
>>> look(table4)
+-------+-------+
| 'bar' | 'foo' |
+=======+=======+
| 1 | 'A' |
+-------+-------+
| 2 | 'B' |
+-------+-------+
| 3 | 'B' |
+-------+-------+
| 42 | 'D' |
+-------+-------+
| 12 | 'E' |
+-------+-------+
>>> # select a range of fields
... table5 = cut(table1, *range(0, 2))
>>> look(table5)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 1 |
+-------+-------+
| 'B' | 2 |
+-------+-------+
| 'B' | 3 |
+-------+-------+
| 'D' | 42 |
+-------+-------+
| 'E' | 12 |
+-------+-------+
Note that any short rows will be padded with None values (or whatever is provided via the missing keyword argument).
See also cutout().
Remove fields. E.g.:
>>> from petl import cutout, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2.7 |
+-------+-------+-------+
| 'B' | 2 | 3.4 |
+-------+-------+-------+
| 'B' | 3 | 7.8 |
+-------+-------+-------+
| 'D' | 42 | 9.0 |
+-------+-------+-------+
| 'E' | 12 | |
+-------+-------+-------+
>>> table2 = cutout(table1, 'bar')
>>> look(table2)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'A' | 2.7 |
+-------+-------+
| 'B' | 3.4 |
+-------+-------+
| 'B' | 7.8 |
+-------+-------+
| 'D' | 9.0 |
+-------+-------+
| 'E' | None |
+-------+-------+
See also cut().
New in version 0.3.
Select rows meeting a condition. E.g.:
>>> from petl import select, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 4 | 9.3 |
+-------+-------+-------+
| 'a' | 2 | 88.2 |
+-------+-------+-------+
| 'b' | 1 | 23.3 |
+-------+-------+-------+
| 'c' | 8 | 42.0 |
+-------+-------+-------+
| 'd' | 7 | 100.9 |
+-------+-------+-------+
| 'c' | 2 | |
+-------+-------+-------+
>>> # the second positional argument can be a function accepting a row
... table2 = select(table1, lambda rec: rec['foo'] == 'a' and rec['baz'] > 88.1)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 2 | 88.2 |
+-------+-------+-------+
>>> # the second positional argument can also be an expression string, which
... # will be converted to a function using expr()
... table3 = select(table1, "{foo} == 'a' and {baz} > 88.1")
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 2 | 88.2 |
+-------+-------+-------+
>>> # the condition can also be applied to a single field
... table4 = select(table1, 'foo', lambda v: v == 'a')
>>> look(table4)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 4 | 9.3 |
+-------+-------+-------+
| 'a' | 2 | 88.2 |
+-------+-------+-------+
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the function op applied to the given field and the given value returns true.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field equals the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field does not equal the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is less than the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is less than or equal to the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is greater than the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is greater than or equal to the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is greater than or equal to minv and less than or equal to maxv.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is greater than or equal to minv and less than maxv.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is greater than minv and less than or equal to maxv.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is greater than minv and less than maxv.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field contains the given value.
New in version 0.10.
Select rows where the given field is a member of the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is not a member of the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is not the given value.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where the given field is an instance of the given type.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows where a regular expression search using the given pattern on the given field returns a match. E.g.:
>>> from petl import selectre, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'aa' | 4 | 9.3 |
+-------+-------+-------+
| 'aaa' | 2 | 88.2 |
+-------+-------+-------+
| 'b' | 1 | 23.3 |
+-------+-------+-------+
| 'ccc' | 8 | 42.0 |
+-------+-------+-------+
| 'bb' | 7 | 100.9 |
+-------+-------+-------+
| 'c' | 2 | |
+-------+-------+-------+
>>> table2 = selectre(table1, 'foo', '[ab]{2}')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'aa' | 4 | 9.3 |
+-------+-------+-------+
| 'aaa' | 2 | 88.2 |
+-------+-------+-------+
| 'bb' | 7 | 100.9 |
+-------+-------+-------+
See also re.search().
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows matching a condition. The where argument should be a function accepting a hybrid row object (supports accessing values either by position or by field name) as argument and returning True or False.
Deprecated since version 0.10.
Use select() instead, it supports the same signature.
Select rows matching a condition. The where argument should be a function accepting a record (row as dictionary of values indexed by field name) as argument and returning True or False.
Deprecated since version 0.9.
Use select() instead.
Select rows of length n.
Changed in version 0.4.
The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.
Select rows matching a condition. The where argument should be a function accepting a single data value as argument and returning True or False.
Deprecated since version 0.10.
Use select() instead, it supports the same signature.
Return a dictionary mapping field values to tables. E.g.:
>>> from petl import facet, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 4 | 9.3 |
+-------+-------+-------+
| 'a' | 2 | 88.2 |
+-------+-------+-------+
| 'b' | 1 | 23.3 |
+-------+-------+-------+
| 'c' | 8 | 42.0 |
+-------+-------+-------+
| 'd' | 7 | 100.9 |
+-------+-------+-------+
| 'c' | 2 | |
+-------+-------+-------+
>>> foo = facet(table1, 'foo')
>>> foo.keys()
['a', 'c', 'b', 'd']
>>> look(foo['a'])
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 4 | 9.3 |
+-------+-------+-------+
| 'a' | 2 | 88.2 |
+-------+-------+-------+
>>> look(foo['c'])
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'c' | 8 | 42.0 |
+-------+-------+-------+
| 'c' | 2 | |
+-------+-------+-------+
Return a dictionary mapping ranges to tables. E.g.:
>>> from petl import rangefacet, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 3 |
+-------+-------+
| 'a' | 7 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'b' | 1 |
+-------+-------+
| 'b' | 9 |
+-------+-------+
| 'c' | 4 |
+-------+-------+
| 'd' | 3 |
+-------+-------+
>>> rf = rangefacet(table1, 'bar', 2)
>>> rf.keys()
[(1, 3), (3, 5), (5, 7), (7, 9)]
>>> look(rf[(1, 3)])
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'b' | 2 |
+-------+-------+
| 'b' | 1 |
+-------+-------+
>>> look(rf[(7, 9)])
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 7 |
+-------+-------+
| 'b' | 9 |
+-------+-------+
Note that the last bin includes both edges.
Convenience function to replace all occurrences of a with b under the given field. See also convert().
New in version 0.5.
Convenience function to replace all instances of a with b under all fields. See also convertall().
New in version 0.5.
Transform values under the given field via an arbitrary function or method invocation. E.g.:
>>> from petl import convert, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | '2.4' |
+-------+-------+
| 'B' | '5.7' |
+-------+-------+
| 'C' | '1.2' |
+-------+-------+
| 'D' | '8.3' |
+-------+-------+
>>> # using the built-in float function:
... table2 = convert(table1, 'bar', float)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 2.4 |
+-------+-------+
| 'B' | 5.7 |
+-------+-------+
| 'C' | 1.2 |
+-------+-------+
| 'D' | 8.3 |
+-------+-------+
>>> # using a lambda function::
... table3 = convert(table2, 'bar', lambda v: v**2)
>>> look(table3)
+-------+-------------------+
| 'foo' | 'bar' |
+=======+===================+
| 'A' | 5.76 |
+-------+-------------------+
| 'B' | 32.49 |
+-------+-------------------+
| 'C' | 1.44 |
+-------+-------------------+
| 'D' | 68.89000000000001 |
+-------+-------------------+
>>> # a method of the data value can also be invoked by passing the method name
... table4 = convert(table1, 'foo', 'lower')
>>> look(table4)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '2.4' |
+-------+-------+
| 'b' | '5.7' |
+-------+-------+
| 'c' | '1.2' |
+-------+-------+
| 'd' | '8.3' |
+-------+-------+
>>> # arguments to the method invocation can also be given
... table5 = convert(table4, 'foo', 'replace', 'a', 'aa')
>>> look(table5)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'aa' | '2.4' |
+-------+-------+
| 'b' | '5.7' |
+-------+-------+
| 'c' | '1.2' |
+-------+-------+
| 'd' | '8.3' |
+-------+-------+
>>> # values can also be translated via a dictionary
... look(table6)
+----------+-------+
| 'gender' | 'age' |
+==========+=======+
| 'M' | 12 |
+----------+-------+
| 'F' | 34 |
+----------+-------+
| '-' | 56 |
+----------+-------+
>>> table7 = convert(table6, 'gender', {'M': 'male', 'F': 'female'})
>>> look(table7)
+----------+-------+
| 'gender' | 'age' |
+==========+=======+
| 'male' | 12 |
+----------+-------+
| 'female' | 34 |
+----------+-------+
| '-' | 56 |
+----------+-------+
Note that the field argument can be a list or tuple of fields, in which case the conversion will be applied to all of the fields given.
Useful for, among other things, string manipulation, see also the methods on the str type.
Convenience function to convert all fields in the table using a common function or mapping. See also convert().
New in version 0.4.
Transform values in one or more fields via functions or method invocations. E.g.:
>>> from petl import fieldconvert, look
>>> look(table1)
+-------+--------+
| 'foo' | 'bar' |
+=======+========+
| '1' | '2.4' |
+-------+--------+
| '3' | '7.9' |
+-------+--------+
| '7' | '2' |
+-------+--------+
| '8.3' | '42.0' |
+-------+--------+
| '2' | 'abc' |
+-------+--------+
>>> table2 = fieldconvert(table1, {'foo': int, 'bar': float})
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 1 | 2.4 |
+-------+-------+
| 3 | 7.9 |
+-------+-------+
| 7 | 2.0 |
+-------+-------+
| None | 42.0 |
+-------+-------+
| 2 | None |
+-------+-------+
>>> # converters can be added or updated using the suffix notation
... look(table3)
+-------+--------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+========+=======+
| '1' | '2.4' | 14 |
+-------+--------+-------+
| '3' | '7.9' | 47 |
+-------+--------+-------+
| '7' | '2' | 11 |
+-------+--------+-------+
| '8.3' | '42.0' | 33 |
+-------+--------+-------+
| '2' | 'abc' | 'xyz' |
+-------+--------+-------+
>>> table4 = fieldconvert(table3)
>>> table4['foo'] = int
>>> table4['bar'] = float
>>> table4['baz'] = lambda v: v**2
>>> look(table4)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1 | 2.4 | 196 |
+-------+-------+-------+
| 3 | 7.9 | 2209 |
+-------+-------+-------+
| 7 | 2.0 | 121 |
+-------+-------+-------+
| None | 42.0 | 1089 |
+-------+-------+-------+
| 2 | None | None |
+-------+-------+-------+
>>> # converters can be functions, method names, or method names with arguments
... look(table5)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| '2' | 'A' | 'x' |
+-------+-------+-------+
| '5' | 'B' | 'y' |
+-------+-------+-------+
| '1' | 'C' | 'y' |
+-------+-------+-------+
| '8.3' | 'D' | 'z' |
+-------+-------+-------+
>>> table6 = fieldconvert(table5)
>>> table6['foo'] = int
>>> table6['bar'] = 'lower'
>>> table6['baz'] = 'replace', 'y', 'yy'
>>> look(table6)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 2 | 'a' | 'x' |
+-------+-------+-------+
| 5 | 'b' | 'yy' |
+-------+-------+-------+
| 1 | 'c' | 'yy' |
+-------+-------+-------+
| None | 'd' | 'z' |
+-------+-------+-------+
Converters can also be dictionaries, which will be used to translate values under the specified field.
Convenience function to convert all field values to numbers where possible. E.g.:
>>> from petl import convertnumbers, look
>>> look(table1)
+-------+-------+--------+--------+
| 'foo' | 'bar' | 'baz' | 'quux' |
+=======+=======+========+========+
| '1' | '3.0' | '9+3j' | 'aaa' |
+-------+-------+--------+--------+
| '2' | '1.3' | '7+2j' | None |
+-------+-------+--------+--------+
>>> table2 = convertnumbers(table1)
>>> look(table2)
+-------+-------+--------+--------+
| 'foo' | 'bar' | 'baz' | 'quux' |
+=======+=======+========+========+
| 1 | 3.0 | (9+3j) | 'aaa' |
+-------+-------+--------+--------+
| 2 | 1.3 | (7+2j) | None |
+-------+-------+--------+--------+
New in version 0.4.
Perform a regular expression search, returning rows that match a given pattern, either anywhere in the row or within a specific field. E.g.:
>>> from petl import search, look
>>> look(table1)
+------------+-------+--------------------------+
| 'foo' | 'bar' | 'baz' |
+============+=======+==========================+
| 'orange' | 12 | 'oranges are nice fruit' |
+------------+-------+--------------------------+
| 'mango' | 42 | 'I like them' |
+------------+-------+--------------------------+
| 'banana' | 74 | 'lovely too' |
+------------+-------+--------------------------+
| 'cucumber' | 41 | 'better than mango' |
+------------+-------+--------------------------+
>>> # search any field
... table2 = search(table1, '.g.')
>>> look(table2)
+------------+-------+--------------------------+
| 'foo' | 'bar' | 'baz' |
+============+=======+==========================+
| 'orange' | 12 | 'oranges are nice fruit' |
+------------+-------+--------------------------+
| 'mango' | 42 | 'I like them' |
+------------+-------+--------------------------+
| 'cucumber' | 41 | 'better than mango' |
+------------+-------+--------------------------+
>>> # search a specific field
... table3 = search(table1, 'foo', '.g.')
>>> look(table3)
+----------+-------+--------------------------+
| 'foo' | 'bar' | 'baz' |
+==========+=======+==========================+
| 'orange' | 12 | 'oranges are nice fruit' |
+----------+-------+--------------------------+
| 'mango' | 42 | 'I like them' |
+----------+-------+--------------------------+
New in version 0.10.
Convenience function to convert values under the given field using a regular expression substitution. See also re.sub().
New in version 0.5.
Changed in version 0.10.
Renamed ‘resub’ to ‘sub’.
Add one or more new fields with values generated by splitting an existing value around occurrences of a regular expression. E.g.:
>>> from petl import split, look
>>> look(table1)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| '1' | 'parad1' | '12' |
+------+------------+---------+
| '2' | 'parad2' | '15' |
+------+------------+---------+
| '3' | 'tempd1' | '18' |
+------+------------+---------+
| '4' | 'tempd2' | '19' |
+------+------------+---------+
>>> table2 = split(table1, 'variable', 'd', ['variable', 'day'])
>>> look(table2)
+------+---------+------------+-------+
| 'id' | 'value' | 'variable' | 'day' |
+======+=========+============+=======+
| '1' | '12' | 'para' | '1' |
+------+---------+------------+-------+
| '2' | '15' | 'para' | '2' |
+------+---------+------------+-------+
| '3' | '18' | 'temp' | '1' |
+------+---------+------------+-------+
| '4' | '19' | 'temp' | '2' |
+------+---------+------------+-------+
See also re.split().
Add one or more new fields with values captured from an existing field searched via a regular expression.
By default the field on which the capture is performed is omitted. It can be included using the include_original argument.
E.g.:
>>> from petl import capture, look
>>> look(table1)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| '1' | 'A1' | '12' |
+------+------------+---------+
| '2' | 'A2' | '15' |
+------+------------+---------+
| '3' | 'B1' | '18' |
+------+------------+---------+
| '4' | 'C12' | '19' |
+------+------------+---------+
>>> table2 = capture(table1, 'variable', '(\w)(\d+)', ['treat', 'time'])
>>> look(table2)
+------+---------+---------+--------+
| 'id' | 'value' | 'treat' | 'time' |
+======+=========+=========+========+
| '1' | '12' | 'A' | '1' |
+------+---------+---------+--------+
| '2' | '15' | 'A' | '2' |
+------+---------+---------+--------+
| '3' | '18' | 'B' | '1' |
+------+---------+---------+--------+
| '4' | '19' | 'C' | '12' |
+------+---------+---------+--------+
>>> # using the include_original argument
... table3 = capture(table1, 'variable', '(\w)(\d+)', ['treat', 'time'], include_original=True)
>>> look(table3)
+------+------------+---------+---------+--------+
| 'id' | 'variable' | 'value' | 'treat' | 'time' |
+======+============+=========+=========+========+
| '1' | 'A1' | '12' | 'A' | '1' |
+------+------------+---------+---------+--------+
| '2' | 'A2' | '15' | 'A' | '2' |
+------+------------+---------+---------+--------+
| '3' | 'B1' | '18' | 'B' | '1' |
+------+------------+---------+---------+--------+
| '4' | 'C12' | '19' | 'C' | '12' |
+------+------------+---------+---------+--------+
See also re.search().
Unpack data values that are lists or tuples. E.g.:
>>> from petl import unpack, look
>>> look(table1)
+-------+------------+
| 'foo' | 'bar' |
+=======+============+
| 1 | ['a', 'b'] |
+-------+------------+
| 2 | ['c', 'd'] |
+-------+------------+
| 3 | ['e', 'f'] |
+-------+------------+
>>> table2 = unpack(table1, 'bar', ['baz', 'quux'])
>>> look(table2)
+-------+-------+--------+
| 'foo' | 'baz' | 'quux' |
+=======+=======+========+
| 1 | 'a' | 'b' |
+-------+-------+--------+
| 2 | 'c' | 'd' |
+-------+-------+--------+
| 3 | 'e' | 'f' |
+-------+-------+--------+
Unpack dictionary values into separate fields. E.g.:
>>> from petl import unpackdict, look
>>> look(table1)
+-------+---------------------------+
| 'foo' | 'bar' |
+=======+===========================+
| 1 | {'quux': 'b', 'baz': 'a'} |
+-------+---------------------------+
| 2 | {'quux': 'd', 'baz': 'c'} |
+-------+---------------------------+
| 3 | {'quux': 'f', 'baz': 'e'} |
+-------+---------------------------+
>>> table2 = unpackdict(table1, 'bar')
>>> look(table2)
+-------+-------+--------+
| 'foo' | 'baz' | 'quux' |
+=======+=======+========+
| 1 | 'a' | 'b' |
+-------+-------+--------+
| 2 | 'c' | 'd' |
+-------+-------+--------+
| 3 | 'e' | 'f' |
+-------+-------+--------+
New in version 0.10.
Transform a table, mapping fields arbitrarily between input and output. E.g.:
>>> from petl import fieldmap, look
>>> look(table1)
+------+----------+-------+----------+----------+
| 'id' | 'sex' | 'age' | 'height' | 'weight' |
+======+==========+=======+==========+==========+
| 1 | 'male' | 16 | 1.45 | 62.0 |
+------+----------+-------+----------+----------+
| 2 | 'female' | 19 | 1.34 | 55.4 |
+------+----------+-------+----------+----------+
| 3 | 'female' | 17 | 1.78 | 74.4 |
+------+----------+-------+----------+----------+
| 4 | 'male' | 21 | 1.33 | 45.2 |
+------+----------+-------+----------+----------+
| 5 | '-' | 25 | 1.65 | 51.9 |
+------+----------+-------+----------+----------+
>>> from collections import OrderedDict
>>> mappings = OrderedDict()
>>> # rename a field
... mappings['subject_id'] = 'id'
>>> # translate a field
... mappings['gender'] = 'sex', {'male': 'M', 'female': 'F'}
>>> # apply a calculation to a field
... mappings['age_months'] = 'age', lambda v: v * 12
>>> # apply a calculation to a combination of fields
... mappings['bmi'] = lambda rec: rec['weight'] / rec['height']**2
>>> # transform and inspect the output
... table2 = fieldmap(table1, mappings)
>>> look(table2)
+--------------+----------+--------------+--------------------+
| 'subject_id' | 'gender' | 'age_months' | 'bmi' |
+==============+==========+==============+====================+
| 1 | 'M' | 192 | 29.48870392390012 |
+--------------+----------+--------------+--------------------+
| 2 | 'F' | 228 | 30.8531967030519 |
+--------------+----------+--------------+--------------------+
| 3 | 'F' | 204 | 23.481883600555488 |
+--------------+----------+--------------+--------------------+
| 4 | 'M' | 252 | 25.55260331279326 |
+--------------+----------+--------------+--------------------+
| 5 | '-' | 300 | 19.0633608815427 |
+--------------+----------+--------------+--------------------+
>>> # field mappings can also be added and/or updated after the table is created
... # via the suffix notation
... table3 = fieldmap(table1)
>>> table3['subject_id'] = 'id'
>>> table3['gender'] = 'sex', {'male': 'M', 'female': 'F'}
>>> table3['age_months'] = 'age', lambda v: v * 12
>>> # use an expression string this time
... table3['bmi'] = '{weight} / {height}**2'
>>> look(table3)
+--------------+----------+--------------+--------------------+
| 'subject_id' | 'gender' | 'age_months' | 'bmi' |
+==============+==========+==============+====================+
| 1 | 'M' | 192 | 29.48870392390012 |
+--------------+----------+--------------+--------------------+
| 2 | 'F' | 228 | 30.8531967030519 |
+--------------+----------+--------------+--------------------+
| 3 | 'F' | 204 | 23.481883600555488 |
+--------------+----------+--------------+--------------------+
| 4 | 'M' | 252 | 25.55260331279326 |
+--------------+----------+--------------+--------------------+
| 5 | '-' | 300 | 19.0633608815427 |
+--------------+----------+--------------+--------------------+
Note also that the mapping value can be an expression string, which will be converted to a lambda function via expr().
Transform rows via an arbitrary function. E.g.:
>>> from petl import rowmap, look
>>> look(table1)
+------+----------+-------+----------+----------+
| 'id' | 'sex' | 'age' | 'height' | 'weight' |
+======+==========+=======+==========+==========+
| 1 | 'male' | 16 | 1.45 | 62.0 |
+------+----------+-------+----------+----------+
| 2 | 'female' | 19 | 1.34 | 55.4 |
+------+----------+-------+----------+----------+
| 3 | 'female' | 17 | 1.78 | 74.4 |
+------+----------+-------+----------+----------+
| 4 | 'male' | 21 | 1.33 | 45.2 |
+------+----------+-------+----------+----------+
| 5 | '-' | 25 | 1.65 | 51.9 |
+------+----------+-------+----------+----------+
>>> def rowmapper(row):
... transmf = {'male': 'M', 'female': 'F'}
... return [row[0],
... transmf[row[1]] if row[1] in transmf else row[1],
... row[2] * 12,
... row[4] / row[3] ** 2]
...
>>> table2 = rowmap(table1, rowmapper, fields=['subject_id', 'gender', 'age_months', 'bmi'])
>>> look(table2)
+--------------+----------+--------------+--------------------+
| 'subject_id' | 'gender' | 'age_months' | 'bmi' |
+==============+==========+==============+====================+
| 1 | 'M' | 192 | 29.48870392390012 |
+--------------+----------+--------------+--------------------+
| 2 | 'F' | 228 | 30.8531967030519 |
+--------------+----------+--------------+--------------------+
| 3 | 'F' | 204 | 23.481883600555488 |
+--------------+----------+--------------+--------------------+
| 4 | 'M' | 252 | 25.55260331279326 |
+--------------+----------+--------------+--------------------+
| 5 | '-' | 300 | 19.0633608815427 |
+--------------+----------+--------------+--------------------+
The rowmapper function should return a single row (list or tuple).
Changed in version 0.9.
Hybrid row objects supporting data value access by either position or by field name are now passed to the rowmapper function.
Transform records via an arbitrary function.
Deprecated since version 0.9.
Use rowmap() insteand.
Map each input row to any number of output rows via an arbitrary function. E.g.:
>>> from petl import rowmapmany, look
>>> look(table1)
+------+----------+-------+----------+----------+
| 'id' | 'sex' | 'age' | 'height' | 'weight' |
+======+==========+=======+==========+==========+
| 1 | 'male' | 16 | 1.45 | 62.0 |
+------+----------+-------+----------+----------+
| 2 | 'female' | 19 | 1.34 | 55.4 |
+------+----------+-------+----------+----------+
| 3 | '-' | 17 | 1.78 | 74.4 |
+------+----------+-------+----------+----------+
| 4 | 'male' | 21 | 1.33 | |
+------+----------+-------+----------+----------+
>>> def rowgenerator(row):
... transmf = {'male': 'M', 'female': 'F'}
... yield [row[0], 'gender', transmf[row[1]] if row[1] in transmf else row[1]]
... yield [row[0], 'age_months', row[2] * 12]
... yield [row[0], 'bmi', row[4] / row[3] ** 2]
...
>>> table2 = rowmapmany(table1, rowgenerator, fields=['subject_id', 'variable', 'value'])
>>> look(table2)
+--------------+--------------+--------------------+
| 'subject_id' | 'variable' | 'value' |
+==============+==============+====================+
| 1 | 'gender' | 'M' |
+--------------+--------------+--------------------+
| 1 | 'age_months' | 192 |
+--------------+--------------+--------------------+
| 1 | 'bmi' | 29.48870392390012 |
+--------------+--------------+--------------------+
| 2 | 'gender' | 'F' |
+--------------+--------------+--------------------+
| 2 | 'age_months' | 228 |
+--------------+--------------+--------------------+
| 2 | 'bmi' | 30.8531967030519 |
+--------------+--------------+--------------------+
| 3 | 'gender' | '-' |
+--------------+--------------+--------------------+
| 3 | 'age_months' | 204 |
+--------------+--------------+--------------------+
| 3 | 'bmi' | 23.481883600555488 |
+--------------+--------------+--------------------+
| 4 | 'gender' | 'M' |
+--------------+--------------+--------------------+
The rowgenerator function should yield zero or more rows (lists or tuples).
See also the melt() function.
Changed in version 0.9.
Hybrid row objects supporting data value access by either position or by field name are now passed to the rowgenerator function.
Map each input row (as a record) to any number of output rows via an arbitrary function.
Deprecated since version 0.9.
Use rowmapmany() instead.
Concatenate data from two or more tables. E.g.:
>>> from petl import look, cat
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 1 | 'A' |
+-------+-------+
| 2 | 'B' |
+-------+-------+
>>> look(table2)
+-------+-------+
| 'bar' | 'baz' |
+=======+=======+
| 'C' | True |
+-------+-------+
| 'D' | False |
+-------+-------+
>>> table3 = cat(table1, table2)
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1 | 'A' | None |
+-------+-------+-------+
| 2 | 'B' | None |
+-------+-------+-------+
| None | 'C' | True |
+-------+-------+-------+
| None | 'D' | False |
+-------+-------+-------+
>>> # can also be used to square up a single table with uneven rows
... look(table4)
+-------+-------+--------+------+
| 'foo' | 'bar' | 'baz' | |
+=======+=======+========+======+
| 'A' | 1 | 2 | |
+-------+-------+--------+------+
| 'B' | '2' | '3.4' | |
+-------+-------+--------+------+
| u'B' | u'3' | u'7.8' | True |
+-------+-------+--------+------+
| 'D' | 'xyz' | 9.0 | |
+-------+-------+--------+------+
| 'E' | None | | |
+-------+-------+--------+------+
>>> look(cat(table4))
+-------+-------+--------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+========+
| 'A' | 1 | 2 |
+-------+-------+--------+
| 'B' | '2' | '3.4' |
+-------+-------+--------+
| u'B' | u'3' | u'7.8' |
+-------+-------+--------+
| 'D' | 'xyz' | 9.0 |
+-------+-------+--------+
| 'E' | None | None |
+-------+-------+--------+
>>> # use the header keyword argument to specify a fixed set of fields
... look(table5)
+-------+-------+
| 'bar' | 'foo' |
+=======+=======+
| 'A' | 1 |
+-------+-------+
| 'B' | 2 |
+-------+-------+
>>> table6 = cat(table5, header=['A', 'foo', 'B', 'bar', 'C'])
>>> look(table6)
+------+-------+------+-------+------+
| 'A' | 'foo' | 'B' | 'bar' | 'C' |
+======+=======+======+=======+======+
| None | 1 | None | 'A' | None |
+------+-------+------+-------+------+
| None | 2 | None | 'B' | None |
+------+-------+------+-------+------+
>>> # using the header keyword argument with two input tables
... look(table7)
+-------+-------+
| 'bar' | 'foo' |
+=======+=======+
| 'A' | 1 |
+-------+-------+
| 'B' | 2 |
+-------+-------+
>>> look(table8)
+-------+-------+
| 'bar' | 'baz' |
+=======+=======+
| 'C' | True |
+-------+-------+
| 'D' | False |
+-------+-------+
>>> table9 = cat(table7, table8, header=['A', 'foo', 'B', 'bar', 'C'])
>>> look(table9)
+------+-------+------+-------+------+
| 'A' | 'foo' | 'B' | 'bar' | 'C' |
+======+=======+======+=======+======+
| None | 1 | None | 'A' | None |
+------+-------+------+-------+------+
| None | 2 | None | 'B' | None |
+------+-------+------+-------+------+
| None | None | None | 'C' | None |
+------+-------+------+-------+------+
| None | None | None | 'D' | None |
+------+-------+------+-------+------+
Note that the tables do not need to share exactly the same fields, any missing fields will be padded with None or whatever is provided via the missing keyword argument.
Changed in version 0.5.
By default, the fields for the output table will be determined as the union of all fields found in the input tables. Use the header keyword argument to override this behaviour and specify a fixed set of fields for the output table.
Select rows with duplicate values under a given key. E.g.:
>>> from petl import duplicates, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2.0 |
+-------+-------+-------+
| 'B' | 2 | 3.4 |
+-------+-------+-------+
| 'D' | 6 | 9.3 |
+-------+-------+-------+
| 'B' | 3 | 7.8 |
+-------+-------+-------+
| 'B' | 2 | 12.3 |
+-------+-------+-------+
| 'E' | None | 1.3 |
+-------+-------+-------+
| 'D' | 4 | 14.5 |
+-------+-------+-------+
>>> table2 = duplicates(table1, 'foo')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'B' | 2 | 3.4 |
+-------+-------+-------+
| 'B' | 3 | 7.8 |
+-------+-------+-------+
| 'B' | 2 | 12.3 |
+-------+-------+-------+
| 'D' | 6 | 9.3 |
+-------+-------+-------+
| 'D' | 4 | 14.5 |
+-------+-------+-------+
>>> # compound keys are supported
... table3 = duplicates(table1, key=['foo', 'bar'])
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'B' | 2 | 3.4 |
+-------+-------+-------+
| 'B' | 2 | 12.3 |
+-------+-------+-------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Select rows with unique values under a given key. E.g.:
>>> from petl import unique, look
>>> look(table1)
+-------+-------+--------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+========+
| 'A' | 1 | 2 |
+-------+-------+--------+
| 'B' | '2' | '3.4' |
+-------+-------+--------+
| 'D' | 'xyz' | 9.0 |
+-------+-------+--------+
| 'B' | u'3' | u'7.8' |
+-------+-------+--------+
| 'B' | '2' | 42 |
+-------+-------+--------+
| 'E' | None | None |
+-------+-------+--------+
| 'D' | 4 | 12.3 |
+-------+-------+--------+
| 'F' | 7 | 2.3 |
+-------+-------+--------+
>>> table2 = unique(table1, 'foo')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2 |
+-------+-------+-------+
| 'E' | None | None |
+-------+-------+-------+
| 'F' | 7 | 2.3 |
+-------+-------+-------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
New in version 0.10.
Select rows with the same key value but differing in some other field. E.g.:
>>> from petl import conflicts, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2.7 |
+-------+-------+-------+
| 'B' | 2 | None |
+-------+-------+-------+
| 'D' | 3 | 9.4 |
+-------+-------+-------+
| 'B' | None | 7.8 |
+-------+-------+-------+
| 'E' | None | |
+-------+-------+-------+
| 'D' | 3 | 12.3 |
+-------+-------+-------+
| 'A' | 2 | None |
+-------+-------+-------+
>>> table2 = conflicts(table1, 'foo')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2.7 |
+-------+-------+-------+
| 'A' | 2 | None |
+-------+-------+-------+
| 'D' | 3 | 9.4 |
+-------+-------+-------+
| 'D' | 3 | 12.3 |
+-------+-------+-------+
Missing values are not considered conflicts. By default, None is treated as the missing value, this can be changed via the missing keyword argument.
One or more fields can be ignored when determining conflicts by providing the exclude keyword argument. Alternatively, fields to use when determining conflicts can be specified explicitly with the include keyword argument.
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Changed in version 0.8.
Added the include and exclude keyword arguments. The exclude keyword argument replaces the ignore keyword argument in previous versions.
Sort the table. Field names or indices (from zero) can be used to specify the key. E.g.:
>>> from petl import sort, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'C' | 2 |
+-------+-------+
| 'A' | 9 |
+-------+-------+
| 'A' | 6 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
>>> table2 = sort(table1, 'foo')
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 9 |
+-------+-------+
| 'A' | 6 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
>>> # sorting by compound key is supported
... table3 = sort(table1, key=['foo', 'bar'])
>>> look(table3)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 6 |
+-------+-------+
| 'A' | 9 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
>>> # if no key is specified, the default is a lexical sort
... table4 = sort(table1)
>>> look(table4)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 6 |
+-------+-------+
| 'A' | 9 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
The buffersize argument should be an int or None.
If the number of rows in the table is less than buffersize, the table will be sorted in memory. Otherwise, the table is sorted in chunks of no more than buffersize rows, each chunk is written to a temporary file, and then a merge sort is performed on the temporary files.
If buffersize is None, the value of petl.transform.defaultbuffersize will be used. By default this is set to 100000 rows, but can be changed, e.g.:
>>> import petl.transform
>>> petl.transform.defaultbuffersize = 500000
If petl.transform.defaultbuffersize is set to None, this forces all sorting to be done entirely in memory.
Perform an equi-join on the given tables. E.g.:
>>> from petl import join, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 3 | 'purple' |
+------+----------+
>>> look(table2)
+------+-----------+
| 'id' | 'shape' |
+======+===========+
| 1 | 'circle' |
+------+-----------+
| 3 | 'square' |
+------+-----------+
| 4 | 'ellipse' |
+------+-----------+
>>> table3 = join(table1, table2, key='id')
>>> look(table3)
+------+----------+----------+
| 'id' | 'colour' | 'shape' |
+======+==========+==========+
| 1 | 'blue' | 'circle' |
+------+----------+----------+
| 3 | 'purple' | 'square' |
+------+----------+----------+
>>> # if no key is given, a natural join is tried
... table4 = join(table1, table2)
>>> look(table4)
+------+----------+----------+
| 'id' | 'colour' | 'shape' |
+======+==========+==========+
| 1 | 'blue' | 'circle' |
+------+----------+----------+
| 3 | 'purple' | 'square' |
+------+----------+----------+
>>> # note behaviour if the key is not unique in either or both tables
... look(table5)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 1 | 'red' |
+------+----------+
| 2 | 'purple' |
+------+----------+
>>> look(table6)
+------+-----------+
| 'id' | 'shape' |
+======+===========+
| 1 | 'circle' |
+------+-----------+
| 1 | 'square' |
+------+-----------+
| 2 | 'ellipse' |
+------+-----------+
>>> table7 = join(table5, table6, key='id')
>>> look(table7)
+------+----------+-----------+
| 'id' | 'colour' | 'shape' |
+======+==========+===========+
| 1 | 'blue' | 'circle' |
+------+----------+-----------+
| 1 | 'blue' | 'square' |
+------+----------+-----------+
| 1 | 'red' | 'circle' |
+------+----------+-----------+
| 1 | 'red' | 'square' |
+------+----------+-----------+
| 2 | 'purple' | 'ellipse' |
+------+----------+-----------+
>>> # compound keys are supported
... look(table8)
+------+--------+----------+
| 'id' | 'time' | 'height' |
+======+========+==========+
| 1 | 1 | 12.3 |
+------+--------+----------+
| 1 | 2 | 34.5 |
+------+--------+----------+
| 2 | 1 | 56.7 |
+------+--------+----------+
>>> look(table9)
+------+--------+----------+
| 'id' | 'time' | 'weight' |
+======+========+==========+
| 1 | 2 | 4.5 |
+------+--------+----------+
| 2 | 1 | 6.7 |
+------+--------+----------+
| 2 | 2 | 8.9 |
+------+--------+----------+
>>> table10 = join(table8, table9, key=['id', 'time'])
>>> look(table10)
+------+--------+----------+----------+
| 'id' | 'time' | 'height' | 'weight' |
+======+========+==========+==========+
| 1 | 2 | 34.5 | 4.5 |
+------+--------+----------+----------+
| 2 | 1 | 56.7 | 6.7 |
+------+--------+----------+----------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Perform a left outer join on the given tables. E.g.:
>>> from petl import leftjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 3 | 'purple' |
+------+----------+
>>> look(table2)
+------+-----------+
| 'id' | 'shape' |
+======+===========+
| 1 | 'circle' |
+------+-----------+
| 3 | 'square' |
+------+-----------+
| 4 | 'ellipse' |
+------+-----------+
>>> table3 = leftjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+----------+
| 'id' | 'colour' | 'shape' |
+======+==========+==========+
| 1 | 'blue' | 'circle' |
+------+----------+----------+
| 2 | 'red' | None |
+------+----------+----------+
| 3 | 'purple' | 'square' |
+------+----------+----------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Perform a right outer join on the given tables. E.g.:
>>> from petl import rightjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 3 | 'purple' |
+------+----------+
>>> look(table2)
+------+-----------+
| 'id' | 'shape' |
+======+===========+
| 1 | 'circle' |
+------+-----------+
| 3 | 'square' |
+------+-----------+
| 4 | 'ellipse' |
+------+-----------+
>>> table3 = rightjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+-----------+
| 'id' | 'colour' | 'shape' |
+======+==========+===========+
| 1 | 'blue' | 'circle' |
+------+----------+-----------+
| 3 | 'purple' | 'square' |
+------+----------+-----------+
| 4 | None | 'ellipse' |
+------+----------+-----------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Perform a full outer join on the given tables. E.g.:
>>> from petl import outerjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 3 | 'purple' |
+------+----------+
>>> look(table2)
+------+-----------+
| 'id' | 'shape' |
+======+===========+
| 1 | 'circle' |
+------+-----------+
| 3 | 'square' |
+------+-----------+
| 4 | 'ellipse' |
+------+-----------+
>>> table3 = outerjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+-----------+
| 'id' | 'colour' | 'shape' |
+======+==========+===========+
| 1 | 'blue' | 'circle' |
+------+----------+-----------+
| 2 | 'red' | None |
+------+----------+-----------+
| 3 | 'purple' | 'square' |
+------+----------+-----------+
| 4 | None | 'ellipse' |
+------+----------+-----------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Form the cartesian product of the given tables. E.g.:
>>> from petl import crossjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
>>> look(table2)
+------+----------+
| 'id' | 'shape' |
+======+==========+
| 1 | 'circle' |
+------+----------+
| 3 | 'square' |
+------+----------+
>>> table3 = crossjoin(table1, table2)
>>> look(table3)
+------+----------+------+----------+
| 'id' | 'colour' | 'id' | 'shape' |
+======+==========+======+==========+
| 1 | 'blue' | 1 | 'circle' |
+------+----------+------+----------+
| 1 | 'blue' | 3 | 'square' |
+------+----------+------+----------+
| 2 | 'red' | 1 | 'circle' |
+------+----------+------+----------+
| 2 | 'red' | 3 | 'square' |
+------+----------+------+----------+
Return rows from the left table where the key value does not occur in the right table. E.g.:
>>> from petl import antijoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 0 | 'black' |
+------+----------+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 4 | 'yellow' |
+------+----------+
| 5 | 'white' |
+------+----------+
>>> look(table2)
+------+----------+
| 'id' | 'shape' |
+======+==========+
| 1 | 'circle' |
+------+----------+
| 3 | 'square' |
+------+----------+
>>> table3 = antijoin(table1, table2, key='id')
>>> look(table3)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 0 | 'black' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 4 | 'yellow' |
+------+----------+
| 5 | 'white' |
+------+----------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Return rows in a that are not in b. E.g.:
>>> from petl import complement, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
>>> look(b)
+-----+-----+-------+
| 'x' | 'y' | 'z' |
+=====+=====+=======+
| 'B' | 2 | False |
+-----+-----+-------+
| 'A' | 9 | False |
+-----+-----+-------+
| 'B' | 3 | True |
+-----+-----+-------+
| 'C' | 9 | True |
+-----+-----+-------+
>>> aminusb = complement(a, b)
>>> look(aminusb)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
>>> bminusa = complement(b, a)
>>> look(bminusa)
+-----+-----+-------+
| 'x' | 'y' | 'z' |
+=====+=====+=======+
| 'A' | 9 | False |
+-----+-----+-------+
| 'B' | 3 | True |
+-----+-----+-------+
Note that the field names of each table are ignored - rows are simply compared following a lexical sort. See also the recordcomplement() function.
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Find the difference between rows in two tables. Returns a pair of tables. E.g.:
>>> from petl import diff, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
>>> look(b)
+-----+-----+-------+
| 'x' | 'y' | 'z' |
+=====+=====+=======+
| 'B' | 2 | False |
+-----+-----+-------+
| 'A' | 9 | False |
+-----+-----+-------+
| 'B' | 3 | True |
+-----+-----+-------+
| 'C' | 9 | True |
+-----+-----+-------+
>>> added, subtracted = diff(a, b)
>>> # rows in b not in a
... look(added)
+-----+-----+-------+
| 'x' | 'y' | 'z' |
+=====+=====+=======+
| 'A' | 9 | False |
+-----+-----+-------+
| 'B' | 3 | True |
+-----+-----+-------+
>>> # rows in a not in b
... look(subtracted)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
Convenient shorthand for (complement(b, a), complement(a, b)). See also complement().
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Find records in a that are not in b. E.g.:
>>> from petl import recordcomplement, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
>>> look(b)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 2 | 'B' | False |
+-------+-------+-------+
| 9 | 'A' | False |
+-------+-------+-------+
| 3 | 'B' | True |
+-------+-------+-------+
| 9 | 'C' | True |
+-------+-------+-------+
>>> aminusb = recordcomplement(a, b)
>>> look(aminusb)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
>>> bminusa = recordcomplement(b, a)
>>> look(bminusa)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 3 | 'B' | True |
+-------+-------+-------+
| 9 | 'A' | False |
+-------+-------+-------+
Note that both tables must have the same set of fields, but that the order of the fields does not matter. See also the complement() function.
See also the discussion of the buffersize argument under the sort() function.
New in version 0.3.
Find the difference between records in two tables. E.g.:
>>> from petl import recorddiff, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
>>> look(b)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 2 | 'B' | False |
+-------+-------+-------+
| 9 | 'A' | False |
+-------+-------+-------+
| 3 | 'B' | True |
+-------+-------+-------+
| 9 | 'C' | True |
+-------+-------+-------+
>>> added, subtracted = recorddiff(a, b)
>>> look(added)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 3 | 'B' | True |
+-------+-------+-------+
| 9 | 'A' | False |
+-------+-------+-------+
>>> look(subtracted)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
Convenient shorthand for (recordcomplement(b, a), recordcomplement(a, b)). See also recordcomplement().
See also the discussion of the buffersize argument under the sort() function.
New in version 0.3.
Return rows in a that are also in b. E.g.:
>>> from petl import intersection, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
>>> look(table2)
+-----+-----+-------+
| 'x' | 'y' | 'z' |
+=====+=====+=======+
| 'B' | 2 | False |
+-----+-----+-------+
| 'A' | 9 | False |
+-----+-----+-------+
| 'B' | 3 | True |
+-----+-----+-------+
| 'C' | 9 | True |
+-----+-----+-------+
>>> table3 = intersection(table1, table2)
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Group rows under the given key then apply aggregation functions.
E.g., applying an aggregation function to whole rows:
>>> from petl import aggregate, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 3 | True |
+-------+-------+-------+
| 'a' | 7 | False |
+-------+-------+-------+
| 'b' | 2 | True |
+-------+-------+-------+
| 'b' | 2 | False |
+-------+-------+-------+
| 'b' | 9 | False |
+-------+-------+-------+
| 'c' | 4 | True |
+-------+-------+-------+
>>> # aggregate whole rows
... table2 = aggregate(table1, 'foo', len)
>>> look(table2)
+-------+---------+
| 'key' | 'value' |
+=======+=========+
| 'a' | 2 |
+-------+---------+
| 'b' | 3 |
+-------+---------+
| 'c' | 1 |
+-------+---------+
E.g., aggregating single fields:
>>> # aggregate single field
... table3 = aggregate(table1, 'foo', sum, 'bar')
>>> look(table3)
+-------+---------+
| 'key' | 'value' |
+=======+=========+
| 'a' | 10 |
+-------+---------+
| 'b' | 13 |
+-------+---------+
| 'c' | 4 |
+-------+---------+
>>> # alternative signature for single field aggregation using keyword args
... table4 = aggregate(table1, key=('foo', 'bar'), aggregation=list, value=('bar', 'baz'))
>>> look(table4)
+----------+-------------------------+
| 'key' | 'value' |
+==========+=========================+
| ('a', 3) | [(3, True)] |
+----------+-------------------------+
| ('a', 7) | [(7, False)] |
+----------+-------------------------+
| ('b', 2) | [(2, True), (2, False)] |
+----------+-------------------------+
| ('b', 9) | [(9, False)] |
+----------+-------------------------+
| ('c', 4) | [(4, True)] |
+----------+-------------------------+
E.g., multiple aggregations:
>>> # aggregate multiple fields
... from collections import OrderedDict
>>> from petl import strjoin
>>> aggregation = OrderedDict()
>>> aggregation['count'] = len
>>> aggregation['minbar'] = 'bar', min
>>> aggregation['maxbar'] = 'bar', max
>>> aggregation['sumbar'] = 'bar', sum
>>> aggregation['listbar'] = 'bar' # default aggregation function is list
>>> aggregation['bars'] = 'bar', strjoin(', ')
>>> table5 = aggregate(table1, 'foo', aggregation)
>>> look(table5)
+-------+---------+----------+----------+----------+-----------+-----------+
| 'key' | 'count' | 'minbar' | 'maxbar' | 'sumbar' | 'listbar' | 'bars' |
+=======+=========+==========+==========+==========+===========+===========+
| 'a' | 2 | 3 | 7 | 10 | [3, 7] | '3, 7' |
+-------+---------+----------+----------+----------+-----------+-----------+
| 'b' | 3 | 2 | 9 | 13 | [2, 2, 9] | '2, 2, 9' |
+-------+---------+----------+----------+----------+-----------+-----------+
| 'c' | 1 | 4 | 4 | 4 | [4] | '4' |
+-------+---------+----------+----------+----------+-----------+-----------+
>>> # can also use list or tuple to specify multiple field aggregation
... aggregation = [('count', len),
... ('minbar', 'bar', min),
... ('maxbar', 'bar', max),
... ('sumbar', 'bar', sum),
... ('listbar', 'bar'), # default aggregation function is list
... ('bars', 'bar', strjoin(', '))]
>>> table6 = aggregate(table1, 'foo', aggregation)
>>> look(table6)
+-------+---------+----------+----------+----------+-----------+-----------+
| 'key' | 'count' | 'minbar' | 'maxbar' | 'sumbar' | 'listbar' | 'bars' |
+=======+=========+==========+==========+==========+===========+===========+
| 'a' | 2 | 3 | 7 | 10 | [3, 7] | '3, 7' |
+-------+---------+----------+----------+----------+-----------+-----------+
| 'b' | 3 | 2 | 9 | 13 | [2, 2, 9] | '2, 2, 9' |
+-------+---------+----------+----------+----------+-----------+-----------+
| 'c' | 1 | 4 | 4 | 4 | [4] | '4' |
+-------+---------+----------+----------+----------+-----------+-----------+
>>> # can also use suffix notation
... table7 = aggregate(table1, 'foo')
>>> table7['count'] = len
>>> table7['minbar'] = 'bar', min
>>> table7['maxbar'] = 'bar', max
>>> table7['sumbar'] = 'bar', sum
>>> table7['listbar'] = 'bar' # default aggregation function is list
>>> table7['bars'] = 'bar', strjoin(', ')
>>> look(table7)
+-------+---------+----------+----------+----------+-----------+-----------+
| 'key' | 'count' | 'minbar' | 'maxbar' | 'sumbar' | 'listbar' | 'bars' |
+=======+=========+==========+==========+==========+===========+===========+
| 'a' | 2 | 3 | 7 | 10 | [3, 7] | '3, 7' |
+-------+---------+----------+----------+----------+-----------+-----------+
| 'b' | 3 | 2 | 9 | 13 | [2, 2, 9] | '2, 2, 9' |
+-------+---------+----------+----------+----------+-----------+-----------+
| 'c' | 1 | 4 | 4 | 4 | [4] | '4' |
+-------+---------+----------+----------+----------+-----------+-----------+
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Group rows into bins then apply aggregation functions. E.g.:
>>> from petl import rangeaggregate, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 3 |
+-------+-------+
| 'a' | 7 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'b' | 1 |
+-------+-------+
| 'b' | 9 |
+-------+-------+
| 'c' | 4 |
+-------+-------+
| 'd' | 3 |
+-------+-------+
>>> table2 = rangeaggregate(table1, 'bar', width=2)
>>> table2['foocount'] = 'foo', len
>>> table2['foolist'] = 'foo' # default is list
>>> look(table2)
+---------+------------+-----------------+
| 'bar' | 'foocount' | 'foolist' |
+=========+============+=================+
| (1, 3) | 2 | ['b', 'b'] |
+---------+------------+-----------------+
| (3, 5) | 3 | ['a', 'd', 'c'] |
+---------+------------+-----------------+
| (5, 7) | 0 | [] |
+---------+------------+-----------------+
| (7, 9) | 1 | ['a'] |
+---------+------------+-----------------+
| (9, 11) | 1 | ['b'] |
+---------+------------+-----------------+
Group rows into bins then count the number of rows in each bin. E.g.:
>>> from petl import rangecounts, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 3 |
+-------+-------+
| 'a' | 7 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'b' | 1 |
+-------+-------+
| 'b' | 9 |
+-------+-------+
| 'c' | 4 |
+-------+-------+
| 'd' | 3 |
+-------+-------+
>>> table2 = rangecounts(table1, 'bar', width=2)
>>> look(table2)
+---------+---------+
| 'range' | 'count' |
+=========+=========+
| (1, 3) | 2 |
+---------+---------+
| (3, 5) | 3 |
+---------+---------+
| (5, 7) | 0 |
+---------+---------+
| (7, 9) | 1 |
+---------+---------+
| (9, 11) | 1 |
+---------+---------+
Reduce rows grouped under the given key via an arbitrary function.
Deprecated since version 0.10.
Use aggregate() instead.
Reduce records grouped under the given key via an arbitrary function.
Deprecated since version 0.9.
Use aggregate() instead.
Reduce rows grouped into bins under the given key via an arbitrary function.
Deprecated since version 0.10.
Use rangeaggregate() instead.
Reduce records grouped into bins under the given key via an arbitrary function.
Deprecated since version 0.9.
Use rangeaggregate() instead.
Merge duplicate rows under the given key. E.g.:
>>> from petl import mergeduplicates, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | 2.7 |
+-------+-------+-------+
| 'B' | 2 | None |
+-------+-------+-------+
| 'D' | 3 | 9.4 |
+-------+-------+-------+
| 'B' | None | 7.8 |
+-------+-------+-------+
| 'E' | None | 42.0 |
+-------+-------+-------+
| 'D' | 3 | 12.3 |
+-------+-------+-------+
| 'A' | 2 | None |
+-------+-------+-------+
>>> table2 = mergeduplicates(table1, 'foo')
>>> look(table2)
+-------+------------------+-----------------------+
| 'foo' | 'bar' | 'baz' |
+=======+==================+=======================+
| 'A' | Conflict([1, 2]) | 2.7 |
+-------+------------------+-----------------------+
| 'B' | 2 | 7.8 |
+-------+------------------+-----------------------+
| 'D' | 3 | Conflict([9.4, 12.3]) |
+-------+------------------+-----------------------+
| 'E' | None | 42.0 |
+-------+------------------+-----------------------+
Missing values are overridden by non-missing values. Conflicting values are reported as an instance of the Conflict class (sub-class of frozenset).
If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize argument is ignored. Otherwise, the data are sorted, see also the discussion of the buffersize argument under the sort() function.
Changed in version 0.3.
Previously conflicts were reported as a list, this is changed to a tuple in version 0.3.
Changed in version 0.10.
Renamed from ‘mergereduce’ to ‘mergeduplicates’. Conflicts now reported as instance of Conflict.
Combine multiple input tables into one sorted output table. E.g.:
>>> from petl import mergesort, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 9 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'A' | 6 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'B' | 3 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'A' | 10 |
+-------+-------+
| 'F' | 4 |
+-------+-------+
>>> table3 = mergesort(table1, table2, key='foo')
>>> look(table3)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 9 |
+-------+-------+
| 'A' | 6 |
+-------+-------+
| 'A' | 10 |
+-------+-------+
| 'B' | 3 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
| 'F' | 4 |
+-------+-------+
If the input tables are already sorted by the given key, give presorted=True as a keyword argument.
This function is equivalent to concatenating the input tables using cat() then sorting, however this function will typically be more efficient, especially if the input tables are presorted.
Keyword arguments:
- key - field name or tuple of fields to sort by (defaults to None - lexical sort)
- reverse - True if sort in reverse (descending) order (defaults to False)
- presorted - True if inputs are already sorted by the given key (defaults to False)
- missing - value to fill with when input tables have different fields (defaults to None)
- header - specify a fixed header for the output table
- buffersize - limit the number of rows in memory per input table when inputs are not presorted
New in version 0.9.
Convenience function to combine multiple tables (via mergesort()) then combine duplicate rows by merging under the given key (via mergeduplicates()). E.g.:
>>> from petl import look, merge
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1 | 'A' | True |
+-------+-------+-------+
| 2 | 'B' | None |
+-------+-------+-------+
| 4 | 'C' | True |
+-------+-------+-------+
>>> look(table2)
+-------+-------+--------+
| 'bar' | 'baz' | 'quux' |
+=======+=======+========+
| 'A' | True | 42.0 |
+-------+-------+--------+
| 'B' | False | 79.3 |
+-------+-------+--------+
| 'C' | False | 12.4 |
+-------+-------+--------+
>>> table3 = merge(table1, table2, key='bar')
>>> look(table3)
+-------+-------+-------------------------+--------+
| 'bar' | 'foo' | 'baz' | 'quux' |
+=======+=======+=========================+========+
| 'A' | 1 | True | 42.0 |
+-------+-------+-------------------------+--------+
| 'B' | 2 | False | 79.3 |
+-------+-------+-------------------------+--------+
| 'C' | 4 | Conflict([False, True]) | 12.4 |
+-------+-------+-------------------------+--------+
Keyword arguments are the same as for mergesort(), except key is required.
Changed in version 0.9.
Now uses mergesort(), should be more efficient for presorted inputs.
Reshape a table, melting fields into data. E.g.:
>>> from petl import melt, look
>>> look(table1)
+------+----------+-------+
| 'id' | 'gender' | 'age' |
+======+==========+=======+
| 1 | 'F' | 12 |
+------+----------+-------+
| 2 | 'M' | 17 |
+------+----------+-------+
| 3 | 'M' | 16 |
+------+----------+-------+
>>> table2 = melt(table1, 'id')
>>> look(table2)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| 1 | 'gender' | 'F' |
+------+------------+---------+
| 1 | 'age' | 12 |
+------+------------+---------+
| 2 | 'gender' | 'M' |
+------+------------+---------+
| 2 | 'age' | 17 |
+------+------------+---------+
| 3 | 'gender' | 'M' |
+------+------------+---------+
| 3 | 'age' | 16 |
+------+------------+---------+
>>> # compound keys are supported
... look(table3)
+------+--------+----------+----------+
| 'id' | 'time' | 'height' | 'weight' |
+======+========+==========+==========+
| 1 | 11 | 66.4 | 12.2 |
+------+--------+----------+----------+
| 2 | 16 | 53.2 | 17.3 |
+------+--------+----------+----------+
| 3 | 12 | 34.5 | 9.4 |
+------+--------+----------+----------+
>>> table4 = melt(table3, key=['id', 'time'])
>>> look(table4)
+------+--------+------------+---------+
| 'id' | 'time' | 'variable' | 'value' |
+======+========+============+=========+
| 1 | 11 | 'height' | 66.4 |
+------+--------+------------+---------+
| 1 | 11 | 'weight' | 12.2 |
+------+--------+------------+---------+
| 2 | 16 | 'height' | 53.2 |
+------+--------+------------+---------+
| 2 | 16 | 'weight' | 17.3 |
+------+--------+------------+---------+
| 3 | 12 | 'height' | 34.5 |
+------+--------+------------+---------+
| 3 | 12 | 'weight' | 9.4 |
+------+--------+------------+---------+
>>> # a subset of variable fields can be selected
... table5 = melt(table3, key=['id', 'time'], variables=['height'])
>>> look(table5)
+------+--------+------------+---------+
| 'id' | 'time' | 'variable' | 'value' |
+======+========+============+=========+
| 1 | 11 | 'height' | 66.4 |
+------+--------+------------+---------+
| 2 | 16 | 'height' | 53.2 |
+------+--------+------------+---------+
| 3 | 12 | 'height' | 34.5 |
+------+--------+------------+---------+
Recast molten data. E.g.:
>>> from petl import recast, look
>>> look(table1)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| 3 | 'age' | 16 |
+------+------------+---------+
| 1 | 'gender' | 'F' |
+------+------------+---------+
| 2 | 'gender' | 'M' |
+------+------------+---------+
| 2 | 'age' | 17 |
+------+------------+---------+
| 1 | 'age' | 12 |
+------+------------+---------+
| 3 | 'gender' | 'M' |
+------+------------+---------+
>>> table2 = recast(table1)
>>> look(table2)
+------+-------+----------+
| 'id' | 'age' | 'gender' |
+======+=======+==========+
| 1 | 12 | 'F' |
+------+-------+----------+
| 2 | 17 | 'M' |
+------+-------+----------+
| 3 | 16 | 'M' |
+------+-------+----------+
>>> # specifying variable and value fields
... look(table3)
+------+----------+--------+
| 'id' | 'vars' | 'vals' |
+======+==========+========+
| 3 | 'age' | 16 |
+------+----------+--------+
| 1 | 'gender' | 'F' |
+------+----------+--------+
| 2 | 'gender' | 'M' |
+------+----------+--------+
| 2 | 'age' | 17 |
+------+----------+--------+
| 1 | 'age' | 12 |
+------+----------+--------+
| 3 | 'gender' | 'M' |
+------+----------+--------+
>>> table4 = recast(table3, variablefield='vars', valuefield='vals')
>>> look(table4)
+------+-------+----------+
| 'id' | 'age' | 'gender' |
+======+=======+==========+
| 1 | 12 | 'F' |
+------+-------+----------+
| 2 | 17 | 'M' |
+------+-------+----------+
| 3 | 16 | 'M' |
+------+-------+----------+
>>> # if there are multiple values for each key/variable pair, and no reducers
... # function is provided, then all values will be listed
... look(table6)
+------+--------+------------+---------+
| 'id' | 'time' | 'variable' | 'value' |
+======+========+============+=========+
| 1 | 11 | 'weight' | 66.4 |
+------+--------+------------+---------+
| 1 | 14 | 'weight' | 55.2 |
+------+--------+------------+---------+
| 2 | 12 | 'weight' | 53.2 |
+------+--------+------------+---------+
| 2 | 16 | 'weight' | 43.3 |
+------+--------+------------+---------+
| 3 | 12 | 'weight' | 34.5 |
+------+--------+------------+---------+
| 3 | 17 | 'weight' | 49.4 |
+------+--------+------------+---------+
>>> table7 = recast(table6, key='id')
>>> look(table7)
+------+--------------+
| 'id' | 'weight' |
+======+==============+
| 1 | [66.4, 55.2] |
+------+--------------+
| 2 | [53.2, 43.3] |
+------+--------------+
| 3 | [34.5, 49.4] |
+------+--------------+
>>> # multiple values can be reduced via an aggregation function
... def mean(values):
... return float(sum(values)) / len(values)
...
>>> table8 = recast(table6, key='id', reducers={'weight': mean})
>>> look(table8)
+------+--------------------+
| 'id' | 'weight' |
+======+====================+
| 1 | 60.800000000000004 |
+------+--------------------+
| 2 | 48.25 |
+------+--------------------+
| 3 | 41.95 |
+------+--------------------+
>>> # missing values are padded with whatever is provided via the missing
... # keyword argument (None by default)
... look(table9)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| 1 | 'gender' | 'F' |
+------+------------+---------+
| 2 | 'age' | 17 |
+------+------------+---------+
| 1 | 'age' | 12 |
+------+------------+---------+
| 3 | 'gender' | 'M' |
+------+------------+---------+
>>> table10 = recast(table9, key='id')
>>> look(table10)
+------+-------+----------+
| 'id' | 'age' | 'gender' |
+======+=======+==========+
| 1 | 12 | 'F' |
+------+-------+----------+
| 2 | 17 | None |
+------+-------+----------+
| 3 | None | 'M' |
+------+-------+----------+
Note that the table is scanned once to discover variables, then a second time to reshape the data and recast variables as fields. How many rows are scanned in the first pass is determined by the samplesize argument.
Transpose rows into columns. E.g.:
>>> from petl import transpose, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1 | 'blue' |
+------+----------+
| 2 | 'red' |
+------+----------+
| 3 | 'purple' |
+------+----------+
| 5 | 'yellow' |
+------+----------+
| 7 | 'orange' |
+------+----------+
>>> table2 = transpose(table1)
>>> look(table2)
+----------+--------+-------+----------+----------+----------+
| 'id' | 1 | 2 | 3 | 5 | 7 |
+==========+========+=======+==========+==========+==========+
| 'colour' | 'blue' | 'red' | 'purple' | 'yellow' | 'orange' |
+----------+--------+-------+----------+----------+----------+
Construct a pivot table. E.g.:
>>> from petl import pivot, look
>>> look(table1)
+----------+----------+---------+---------+
| 'region' | 'gender' | 'style' | 'units' |
+==========+==========+=========+=========+
| 'east' | 'boy' | 'tee' | 12 |
+----------+----------+---------+---------+
| 'east' | 'boy' | 'golf' | 14 |
+----------+----------+---------+---------+
| 'east' | 'boy' | 'fancy' | 7 |
+----------+----------+---------+---------+
| 'east' | 'girl' | 'tee' | 3 |
+----------+----------+---------+---------+
| 'east' | 'girl' | 'golf' | 8 |
+----------+----------+---------+---------+
| 'east' | 'girl' | 'fancy' | 18 |
+----------+----------+---------+---------+
| 'west' | 'boy' | 'tee' | 12 |
+----------+----------+---------+---------+
| 'west' | 'boy' | 'golf' | 15 |
+----------+----------+---------+---------+
| 'west' | 'boy' | 'fancy' | 8 |
+----------+----------+---------+---------+
| 'west' | 'girl' | 'tee' | 6 |
+----------+----------+---------+---------+
>>> table2 = pivot(table1, 'region', 'gender', 'units', sum)
>>> look(table2)
+----------+-------+--------+
| 'region' | 'boy' | 'girl' |
+==========+=======+========+
| 'east' | 33 | 29 |
+----------+-------+--------+
| 'west' | 35 | 23 |
+----------+-------+--------+
>>> table3 = pivot(table1, 'region', 'style', 'units', sum)
>>> look(table3)
+----------+---------+--------+-------+
| 'region' | 'fancy' | 'golf' | 'tee' |
+==========+=========+========+=======+
| 'east' | 25 | 22 | 15 |
+----------+---------+--------+-------+
| 'west' | 9 | 31 | 18 |
+----------+---------+--------+-------+
>>> table4 = pivot(table1, 'gender', 'style', 'units', sum)
>>> look(table4)
+----------+---------+--------+-------+
| 'gender' | 'fancy' | 'golf' | 'tee' |
+==========+=========+========+=======+
| 'boy' | 15 | 29 | 24 |
+----------+---------+--------+-------+
| 'girl' | 19 | 24 | 9 |
+----------+---------+--------+-------+
See also recast().
Alternative implementation of join(), where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table.
May be faster and/or more resource efficient where the right table is small and the left table is large.
New in version 0.5.
Alternative implementation of leftjoin(), where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table.
May be faster and/or more resource efficient where the right table is small and the left table is large.
New in version 0.5.
Alternative implementation of rightjoin(), where the join is executed by constructing an in-memory lookup for the left hand table, then iterating over rows from the right hand table.
May be faster and/or more resource efficient where the left table is small and the right table is large.
New in version 0.5.
Alternative implementation of antijoin(), where the join is executed by constructing an in-memory set for all keys found in the right hand table, then iterating over rows from the left hand table.
May be faster and/or more resource efficient where the right table is small and the left table is large.
New in version 0.5.
Alternative implementation of complement(), where the complement is executed by constructing an in-memory set for all rows found in the right hand table, then iterating over rows from the left hand table.
May be faster and/or more resource efficient where the right table is small and the left table is large.
New in version 0.5.
Alternative implementation of intersection(), where the intersection is executed by constructing an in-memory set for all rows found in the right hand table, then iterating over rows from the left hand table.
May be faster and/or more resource efficient where the right table is small and the left table is large.
New in version 0.5.
Convert a table to a sequence of values in row-major order. E.g.:
>>> from petl import flatten, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'C' | 7 | False |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
| 'C' | 9 | True |
+-------+-------+-------+
>>> list(flatten(table1))
['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9, True]
See also unflatten().
New in version 0.7.
Convert a sequence of values in row-major order into a table. E.g.:
>>> from petl import unflatten, look
>>> input = ['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9]
>>> table = unflatten(input, 3)
>>> look(table)
+------+------+-------+
| 'f0' | 'f1' | 'f2' |
+======+======+=======+
| 'A' | 1 | True |
+------+------+-------+
| 'C' | 7 | False |
+------+------+-------+
| 'B' | 2 | False |
+------+------+-------+
| 'C' | 9 | None |
+------+------+-------+
>>> # a table and field name can also be provided as arguments
... look(table1)
+---------+
| 'lines' |
+=========+
| 'A' |
+---------+
| 1 |
+---------+
| True |
+---------+
| 'C' |
+---------+
| 7 |
+---------+
| False |
+---------+
| 'B' |
+---------+
| 2 |
+---------+
| False |
+---------+
| 'C' |
+---------+
>>> table2 = unflatten(table1, 'lines', 3)
>>> look(table2)
+------+------+-------+
| 'f0' | 'f1' | 'f2' |
+======+======+=======+
| 'A' | 1 | True |
+------+------+-------+
| 'C' | 7 | False |
+------+------+-------+
| 'B' | 2 | False |
+------+------+-------+
| 'C' | 9 | None |
+------+------+-------+
See also flatten().
New in version 0.7.
Join two or more tables by row order. E.g.:
>>> from petl import annex, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 9 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
>>> look(table2)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'B' | 3 |
+-------+-------+
| 'D' | 10 |
+-------+-------+
>>> table3 = annex(table1, table2)
>>> look(table3)
+-------+-------+-------+-------+
| 'foo' | 'bar' | 'foo' | 'baz' |
+=======+=======+=======+=======+
| 'A' | 9 | 'B' | 3 |
+-------+-------+-------+-------+
| 'C' | 2 | 'D' | 10 |
+-------+-------+-------+-------+
| 'F' | 1 | None | None |
+-------+-------+-------+-------+
New in version 0.10.
Reduce rows. E.g.:
>>> from petl import fold, look
>>> look(table1)
+------+---------+
| 'id' | 'count' |
+======+=========+
| 1 | 3 |
+------+---------+
| 1 | 5 |
+------+---------+
| 2 | 4 |
+------+---------+
| 2 | 8 |
+------+---------+
>>> import operator
>>> table2 = fold(table1, 'id', operator.add, 'count', presorted=True)
>>> look(table2)
+-------+---------+
| 'key' | 'value' |
+=======+=========+
| 1 | 8 |
+-------+---------+
| 2 | 12 |
+-------+---------+
See also the Python standard reduce() function.
New in version 0.10.
Add a field of row numbers. E.g.:
>>> from petl import addrownumbers, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 9 |
+-------+-------+
| 'C' | 2 |
+-------+-------+
| 'F' | 1 |
+-------+-------+
>>> table2 = addrownumbers(table1)
>>> look(table2)
+-------+-------+-------+
| 'row' | 'foo' | 'bar' |
+=======+=======+=======+
| 1 | 'A' | 9 |
+-------+-------+-------+
| 2 | 'C' | 2 |
+-------+-------+-------+
| 3 | 'F' | 1 |
+-------+-------+-------+
New in version 0.10.
Add a column of data to the table. E.g.:
>>> from petl import addcolumn, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A' | 1 |
+-------+-------+
| 'B' | 2 |
+-------+-------+
>>> col = [True, False]
>>> table2 = addcolumn(table1, 'baz', col)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A' | 1 | True |
+-------+-------+-------+
| 'B' | 2 | False |
+-------+-------+-------+
New in version 0.10.
Add a field with a fixed or calculated value. E.g.:
>>> from petl import addfield, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'M' | 12 |
+-------+-------+
| 'F' | 34 |
+-------+-------+
| '-' | 56 |
+-------+-------+
>>> # using a fixed value
... table2 = addfield(table1, 'baz', 42)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'M' | 12 | 42 |
+-------+-------+-------+
| 'F' | 34 | 42 |
+-------+-------+-------+
| '-' | 56 | 42 |
+-------+-------+-------+
>>> # calculating the value
... table2 = addfield(table1, 'baz', lambda rec: rec['bar'] * 2)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'M' | 12 | 24 |
+-------+-------+-------+
| 'F' | 34 | 68 |
+-------+-------+-------+
| '-' | 56 | 112 |
+-------+-------+-------+
>>> # an expression string can also be used via expr
... from petl import expr
>>> table3 = addfield(table1, 'baz', expr('{bar} * 2'))
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'M' | 12 | 24 |
+-------+-------+-------+
| 'F' | 34 | 68 |
+-------+-------+-------+
| '-' | 56 | 112 |
+-------+-------+-------+
Changed in version 0.10.
Renamed ‘extend’ to ‘addfield’.
Write the table to a CSV file. E.g.:
>>> from petl import tocsv, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
>>> tocsv(table, 'test.csv')
>>> # look what it did
... from petl import fromcsv
>>> look(fromcsv('test.csv'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '2' |
+-------+-------+
The filename argument is the path of the delimited file, all other keyword arguments are passed to csv.writer(). So, e.g., to override the delimiter from the default CSV dialect, provide the delimiter keyword argument.
Note that if a file already exists at the given location, it will be overwritten.
Append data rows to an existing CSV file. E.g.:
>>> # look at an existing CSV file
... from petl import look, fromcsv
>>> testcsv = fromcsv('test.csv')
>>> look(testcsv)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '2' |
+-------+-------+
>>> # append some data
... look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'd' | 7 |
+-------+-------+
| 'e' | 42 |
+-------+-------+
| 'f' | 12 |
+-------+-------+
>>> from petl import appendcsv
>>> appendcsv(table, 'test.csv')
>>> # look what it did
... look(testcsv)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | '1' |
+-------+-------+
| 'b' | '2' |
+-------+-------+
| 'c' | '2' |
+-------+-------+
| 'd' | '7' |
+-------+-------+
| 'e' | '42' |
+-------+-------+
| 'f' | '12' |
+-------+-------+
The filename argument is the path of the delimited file, all other keyword arguments are passed to csv.writer(). So, e.g., to override the delimiter from the default CSV dialect, provide the delimiter keyword argument.
Note that no attempt is made to check that the fields or row lengths are consistent with the existing data, the data rows from the table are simply appended to the file. See also the cat() function.
Convenience function, as tocsv() but with different default dialect (tab delimited).
New in version 0.9.
Convenience function, as appendcsv() but with different default dialect (tab delimited).
New in version 0.9.
Write the table to a pickle file. E.g.:
>>> from petl import topickle, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
>>> topickle(table, 'test.dat')
>>> # look what it did
... from petl import frompickle
>>> look(frompickle('test.dat'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
Note that if a file already exists at the given location, it will be overwritten.
The pickle file format preserves type information, i.e., reading and writing is round-trippable.
Append data to an existing pickle file. E.g.:
>>> from petl import look, frompickle
>>> # inspect an existing pickle file
... testdat = frompickle('test.dat')
>>> look(testdat)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
>>> # append some data
... from petl import appendpickle
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'd' | 7 |
+-------+-------+
| 'e' | 42 |
+-------+-------+
| 'f' | 12 |
+-------+-------+
>>> appendpickle(table, 'test.dat')
>>> # look what it did
... look(testdat)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
| 'd' | 7 |
+-------+-------+
| 'e' | 42 |
+-------+-------+
| 'f' | 12 |
+-------+-------+
Note that no attempt is made to check that the fields or row lengths are consistent with the existing data, the data rows from the table are simply appended to the file. See also the cat() function.
Load data into a table in an sqlite3 database. Note that if the database table exists, it will be truncated, i.e., all existing rows will be deleted prior to inserting the new data. E.g.:
>>> from petl import tosqlite3, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
>>> # by default, if the table does not already exist, it will be created
... tosqlite3(table, 'test.db', 'foobar')
>>> # look what it did
... from petl import fromsqlite3
>>> look(fromsqlite3('test.db', 'select * from foobar'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| u'a' | 1 |
+-------+-------+
| u'b' | 2 |
+-------+-------+
| u'c' | 2 |
+-------+-------+
Changed in version 0.10.2.
Either a database file name or a connection object can be given as the second argument. (Note that cachetag() is only implemented if a file name is given.)
Load data into an existing table in an sqlite3 database. Note that the database table will be appended, i.e., the new data will be inserted into the table, and any existing rows will remain. E.g.:
>>> from petl import appendsqlite3, look
>>> look(moredata)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'd' | 7 |
+-------+-------+
| 'e' | 9 |
+-------+-------+
| 'f' | 1 |
+-------+-------+
>>> appendsqlite3(moredata, 'test.db', 'foobar')
>>> # look what it did
... from petl import look, fromsqlite3
>>> look(fromsqlite3('test.db', 'select * from foobar'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| u'a' | 1 |
+-------+-------+
| u'b' | 2 |
+-------+-------+
| u'c' | 2 |
+-------+-------+
| u'd' | 7 |
+-------+-------+
| u'e' | 9 |
+-------+-------+
| u'f' | 1 |
+-------+-------+
Changed in version 0.10.2.
Either a database file name or a connection object can be given as the second argument. (Note that cachetag() is only implemented if a file name is given.)
Load data into an existing database table via a DB-API 2.0 connection or cursor. Note that the database table will be truncated, i.e., all existing rows will be deleted prior to inserting the new data.
E.g.:
>>> from petl import look, todb
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
... using sqlite3:
>>> import sqlite3
>>> connection = sqlite3.connect('test.db')
>>> # assuming table "foobar" already exists in the database
... todb(table, connection, 'foobar')
... using psycopg2:
>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> # assuming table "foobar" already exists in the database
... todb(table, connection, 'foobar')
... using MySQLdb:
>>> import MySQLdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> # assuming table "foobar" already exists in the database
... todb(table, connection, 'foobar')
Changed in version 0.10.2.
A cursor can also be provided instead of a connection, e.g.:
>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> cursor = connection.cursor()
>>> todb(table, cursor, 'foobar')
Load data into an existing database table via a DB-API 2.0 connection or cursor. Note that the database table will be appended, i.e., the new data will be inserted into the table, and any existing rows will remain.
E.g.:
>>> from petl import look, appenddb
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
... using sqlite3:
>>> import sqlite3
>>> connection = sqlite3.connect('test.db')
>>> # assuming table "foobar" already exists in the database
... appenddb(table, connection, 'foobar')
... using psycopg2:
>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> # assuming table "foobar" already exists in the database
... appenddb(table, connection, 'foobar')
... using MySQLdb:
>>> import MySQLdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> # assuming table "foobar" already exists in the database
... appenddb(table, connection, 'foobar')
Changed in version 0.10.2.
A cursor can also be provided instead of a connection, e.g.:
>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> cursor = connection.cursor()
>>> appenddb(table, cursor, 'foobar')
Write the table to a text file. E.g.:
>>> from petl import totext, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
>>> prologue = """{| class="wikitable"
... |-
... ! foo
... ! bar
... """
>>> template = """|-
... | {foo}
... | {bar}
... """
>>> epilogue = "|}"
>>> totext(table, 'test.txt', template, prologue, epilogue)
>>>
>>> # see what we did
... with open('test.txt') as f:
... print f.read()
...
{| class="wikitable"
|-
! foo
! bar
|-
| a
| 1
|-
| b
| 2
|-
| c
| 2
|}
The template will be used to format each row via str.format.
As totext() but the file is opened in append mode.
Write a table in JSON format. E.g.:
>>> from petl import tojson, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
| 'c' | 2 |
+-------+-------+
>>> tojson(table, 'example.json')
>>> # check what it did
... import json
>>> with open('example.json') as f:
... json.load(f)
...
[{u'foo': u'a', u'bar': 1}, {u'foo': u'b', u'bar': 2}, {u'foo': u'c', u'bar': 2}]
Note that this is currently not streaming, all data is loaded into memory before being written to the file.
New in version 0.5.
Return the header row for the given table. E.g.:
>>> from petl import header
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> header(table)
['foo', 'bar']
Return a container supporting iteration over data rows in a given table. I.e., like iterdata() only a container is returned so you can iterate over it multiple times.
Changed in version 0.10.
Now returns a container, previously returned an iterator. See also iterdata().
Return an iterator over the data rows for the given table. E.g.:
>>> from petl import data
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> it = data(table)
>>> it.next()
['a', 1]
>>> it.next()
['b', 2]
Changed in version 0.3.
Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().
Changed in version 0.10.
Renamed from “data”.
Deprecated since version 0.3.
Use data() instead, it supports slice arguments.
Return an iterator over the data in the table, yielding each row as a dictionary of values indexed by field name. E.g.:
>>> from petl import records
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> it = records(table)
>>> it.next()
{'foo': 'a', 'bar': 1}
>>> it.next()
{'foo': 'b', 'bar': 2}
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Short rows are padded, e.g.:
>>> table = [['foo', 'bar'], ['a', 1], ['b']]
>>> it = records(table)
>>> it.next()
{'foo': 'a', 'bar': 1}
>>> it.next()
{'foo': 'b', 'bar': None}
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Return the string values of all fields for the given table. If the fields are strings, then this function is equivalent to header(), i.e.:
>>> from petl import header, fieldnames
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> header(table)
['foo', 'bar']
>>> fieldnames(table)
['foo', 'bar']
>>> header(table) == fieldnames(table)
True
Allows for custom field objects, e.g.:
>>> class CustomField(object):
... def __init__(self, id, description):
... self.id = id
... self.description = description
... def __str__(self):
... return self.id
... def __repr__(self):
... return 'CustomField(%r, %r)' % (self.id, self.description)
...
>>> table = [[CustomField('foo', 'Get some foo.'), CustomField('bar', 'A lot of bar.')],
... ['a', 1],
... ['b', 2]]
>>> header(table)
[CustomField('foo', 'Get some foo.'), CustomField('bar', 'A lot of bar.')]
>>> fieldnames(table)
['foo', 'bar']
Count the number of data rows in a table. E.g.:
>>> from petl import nrows
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> nrows(table)
2
Changed in version 0.10.
Renamed from ‘rowcount’ to ‘nrows’.
Format a portion of the table as text for inspection in an interactive session. E.g.:
>>> from petl import look
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a' | 1 |
+-------+-------+
| 'b' | 2 |
+-------+-------+
Any irregularities in the length of header and/or data rows will appear as blank cells, e.g.:
>>> table = [['foo', 'bar'], ['a'], ['b', 2, True]]
>>> look(table)
+-------+-------+------+
| 'foo' | 'bar' | |
+=======+=======+======+
| 'a' | | |
+-------+-------+------+
| 'b' | 2 | True |
+-------+-------+------+
Changed in version 0.3.
Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().
Changed in version 0.8.
The properties n and p can be used to look at the next and previous rows respectively. I.e., try >>> look(table) then >>> _.n` then >>> _.p.
Format the entire table as text for inspection in an interactive session.
N.B., this will load the entire table into memory.
Format a portion of a table as text in a column-oriented layout for inspection in an interactive session. E.g.:
>>> from petl import see
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> see(table)
'foo': 'a', 'b'
'bar': 1, 2
Useful for tables with a larger number of fields.
Changed in version 0.3.
Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().
Return a container supporting iteration over values in a given field or fields. I.e., like itervalues() only a container is returned so you can iterate over it multiple times.
Changed in version 0.7.
Now returns a container, previously returned an iterator. See also itervalues().
Return an iterator over values in a given field or fields. E.g.:
>>> from petl import itervalues
>>> table = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]]
>>> foo = itervalues(table, 'foo')
>>> foo.next()
'a'
>>> foo.next()
'b'
>>> foo.next()
'b'
>>> foo.next()
'c'
>>> foo.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.
If rows are uneven, the value of the keyword argument missing is returned.
More than one field can be selected, e.g.:
>>> table = [['foo', 'bar', 'baz'],
... [1, 'a', True],
... [2, 'bb', True],
... [3, 'd', False]]
>>> foobaz = itervalues(table, ('foo', 'baz'))
>>> foobaz.next()
(1, True)
>>> foobaz.next()
(2, True)
>>> foobaz.next()
(3, False)
>>> foobaz.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Changed in version 0.3.
Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().
Changed in version 0.7.
In previous releases this function was known as ‘values’. Also in this release the behaviour with short rows is changed. Now for any value missing due to a short row, None is returned by default, or whatever is given by the missing keyword argument.
Deprecated since version 0.3.
Use set(values(table, *fields)) instead, see also values().
Count the number of occurrences of value under the given field. Returns the absolute count and relative frequency as a pair. E.g.:
>>> from petl import valuecount
>>> table = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 7))
>>> n, f = valuecount(table, 'foo', 'b')
>>> n
2
>>> f
0.6666666666666666
The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.
Find distinct values for the given field and count the number and relative frequency of occurrences. Returns a table mapping values to counts, with most common values first. E.g.:
>>> from petl import valuecounts, look
>>> table = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]]
>>> look(valuecounts(table, 'foo'))
+---------+---------+-------------+
| 'value' | 'count' | 'frequency' |
+=========+=========+=============+
| 'b' | 2 | 0.5 |
+---------+---------+-------------+
| 'a' | 1 | 0.25 |
+---------+---------+-------------+
| 'c' | 1 | 0.25 |
+---------+---------+-------------+
>>> look(valuecounts(table, 'bar'))
+---------+---------+--------------------+
| 'value' | 'count' | 'frequency' |
+=========+=========+====================+
| True | 2 | 0.6666666666666666 |
+---------+---------+--------------------+
| False | 1 | 0.3333333333333333 |
+---------+---------+--------------------+
The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.
Find distinct values for the given field and count the number of occurrences. Returns a dict mapping values to counts. E.g.:
>>> from petl import valuecounter
>>> table = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]]
>>> c = valuecounter(table, 'foo')
>>> c['a']
1
>>> c['b']
2
>>> c['c']
1
>>> c
Counter({'b': 2, 'a': 1, 'c': 1})
The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.
Construct a dict mapping field names to lists of values. E.g.:
>>> from petl import columns
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> cols = columns(table)
>>> cols['foo']
['a', 'b', 'b']
>>> cols['bar']
[1, 2, 3]
Like columns() but stratified by values of the given key field. E.g.:
>>> from petl import facetcolumns
>>> table = [['foo', 'bar', 'baz'],
... ['a', 1, True],
... ['b', 2, True],
... ['b', 3]]
>>> fc = facetcolumns(table, 'foo')
>>> fc['a']
{'baz': [True], 'foo': ['a'], 'bar': [1]}
>>> fc['b']
{'baz': [True, None], 'foo': ['b', 'b'], 'bar': [2, 3]}
>>> fc['c']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'c'
New in version 0.8.
Return True if there are no duplicate values for the given field(s), otherwise False. E.g.:
>>> from petl import isunique
>>> table = [['foo', 'bar'], ['a', 1], ['b'], ['b', 2], ['c', 3, True]]
>>> isunique(table, 'foo')
False
>>> isunique(table, 'bar')
True
The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.
Changed in version 0.10.
Renamed from “unique”. See also petl.unique().
Return True if the table is ordered (i.e., sorted) by the given key. E.g.:
>>> from petl import isordered, look
>>> look(table)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 1 | True |
+-------+-------+-------+
| 'b' | 3 | True |
+-------+-------+-------+
| 'b' | 2 | |
+-------+-------+-------+
>>> isordered(table, key='foo')
True
>>> isordered(table, key='foo', strict=True)
False
>>> isordered(table, key='foo', reverse=True)
False
New in version 0.10.
Find minimum and maximum values under the given field. E.g.:
>>> from petl import limits
>>> t1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> minv, maxv = limits(t1, 'bar')
>>> minv
1
>>> maxv
3
The field argument can be a field name or index (starting from zero).
Calculate basic descriptive statistics on a given field. E.g.:
>>> from petl import stats
>>> table = [['foo', 'bar', 'baz'],
... ['A', 1, 2],
... ['B', '2', '3.4'],
... [u'B', u'3', u'7.8', True],
... ['D', 'xyz', 9.0],
... ['E', None]]
>>> stats(table, 'bar')
{'count': 3, 'errors': 2, 'min': 1.0, 'max': 3.0, 'sum': 6.0, 'mean': 2.0}
The field argument can be a field name or index (starting from zero).
Convenience function to report statistics on value lengths under the given field. E.g.:
>>> from petl import lenstats
>>> table1 = [['foo', 'bar'],
... [1, 'a'],
... [2, 'aaa'],
... [3, 'aa'],
... [4, 'aaa'],
... [5, 'aaaaaaaaaaa']]
>>> lenstats(table1, 'bar')
{'count': 5, 'errors': 0, 'min': 1.0, 'max': 11.0, 'sum': 20.0, 'mean': 4.0}
Profile string patterns in the given field, returning a table of patterns, counts and frequencies. E.g.:
>>> from petl import stringpatterns, look
>>> table = [['foo', 'bar'],
... ['Mr. Foo', '123-1254'],
... ['Mrs. Bar', '234-1123'],
... ['Mr. Spo', '123-1254'],
... [u'Mr. Baz', u'321 1434'],
... [u'Mrs. Baz', u'321 1434'],
... ['Mr. Quux', '123-1254-XX']]
>>> foopats = stringpatterns(table, 'foo')
>>> look(foopats)
+------------+---------+---------------------+
| 'pattern' | 'count' | 'frequency' |
+============+=========+=====================+
| 'Aa. Aaa' | 3 | 0.5 |
+------------+---------+---------------------+
| 'Aaa. Aaa' | 2 | 0.3333333333333333 |
+------------+---------+---------------------+
| 'Aa. Aaaa' | 1 | 0.16666666666666666 |
+------------+---------+---------------------+
>>> barpats = stringpatterns(table, 'bar')
>>> look(barpats)
+---------------+---------+---------------------+
| 'pattern' | 'count' | 'frequency' |
+===============+=========+=====================+
| '999-9999' | 3 | 0.5 |
+---------------+---------+---------------------+
| '999 9999' | 2 | 0.3333333333333333 |
+---------------+---------+---------------------+
| '999-9999-AA' | 1 | 0.16666666666666666 |
+---------------+---------+---------------------+
New in version 0.5.
Profile string patterns in the given field, returning a dict mapping patterns to counts.
New in version 0.5.
Report on row lengths found in the table. E.g.:
>>> from petl import look, rowlengths
>>> table = [['foo', 'bar', 'baz'],
... ['A', 1, 2],
... ['B', '2', '3.4'],
... [u'B', u'3', u'7.8', True],
... ['D', 'xyz', 9.0],
... ['E', None],
... ['F', 9]]
>>> look(rowlengths(table))
+----------+---------+
| 'length' | 'count' |
+==========+=========+
| 3 | 3 |
+----------+---------+
| 2 | 2 |
+----------+---------+
| 4 | 1 |
+----------+---------+
Count the number of values found for each Python type and return a table mapping class names to counts and frequencies. E.g.:
>>> from petl import look, typecounts
>>> table = [['foo', 'bar', 'baz'],
... ['A', 1, 2],
... ['B', u'2', '3.4'],
... [u'B', u'3', u'7.8', True],
... ['D', u'xyz', 9.0],
... ['E', 42]]
>>> look(typecounts(table, 'foo'))
+-----------+---------+-------------+
| 'type' | 'count' | 'frequency' |
+===========+=========+=============+
| 'str' | 4 | 0.8 |
+-----------+---------+-------------+
| 'unicode' | 1 | 0.2 |
+-----------+---------+-------------+
>>> look(typecounts(table, 'bar'))
+-----------+---------+-------------+
| 'type' | 'count' | 'frequency' |
+===========+=========+=============+
| 'unicode' | 3 | 0.6 |
+-----------+---------+-------------+
| 'int' | 2 | 0.4 |
+-----------+---------+-------------+
>>> look(typecounts(table, 'baz'))
+-----------+---------+-------------+
| 'type' | 'count' | 'frequency' |
+===========+=========+=============+
| 'int' | 1 | 0.25 |
+-----------+---------+-------------+
| 'float' | 1 | 0.25 |
+-----------+---------+-------------+
| 'unicode' | 1 | 0.25 |
+-----------+---------+-------------+
| 'str' | 1 | 0.25 |
+-----------+---------+-------------+
The field argument can be a field name or index (starting from zero).
Changed in version 0.6.
Added frequency.
Count the number of values found for each Python type. E.g.:
>>> from petl import typecounter
>>> table = [['foo', 'bar', 'baz'],
... ['A', 1, 2],
... ['B', u'2', '3.4'],
... [u'B', u'3', u'7.8', True],
... ['D', u'xyz', 9.0],
... ['E', 42]]
>>> typecounter(table, 'foo')
Counter({'str': 4, 'unicode': 1})
>>> typecounter(table, 'bar')
Counter({'unicode': 3, 'int': 2})
>>> typecounter(table, 'baz')
Counter({'int': 1, 'float': 1, 'unicode': 1, 'str': 1})
The field argument can be a field name or index (starting from zero).
Return a set containing all Python types found for values in the given field. E.g.:
>>> from petl import typeset
>>> table = [['foo', 'bar', 'baz'],
... ['A', 1, '2'],
... ['B', u'2', '3.4'],
... [u'B', u'3', '7.8', True],
... ['D', u'xyz', 9.0],
... ['E', 42]]
>>> typeset(table, 'foo')
set([<type 'str'>, <type 'unicode'>])
>>> typeset(table, 'bar')
set([<type 'int'>, <type 'unicode'>])
>>> typeset(table, 'baz')
set([<type 'float'>, <type 'str'>])
The field argument can be a field name or index (starting from zero).
Count the number of str or unicode values that can be parsed as ints, floats or via custom parser functions. Return a table mapping parser names to the number of values successfully parsed and the number of errors. E.g.:
>>> from petl import look, parsecounts
>>> table = [['foo', 'bar', 'baz'],
... ['A', 'aaa', 2],
... ['B', u'2', '3.4'],
... [u'B', u'3', u'7.8', True],
... ['D', '3.7', 9.0],
... ['E', 42]]
>>> look(parsecounts(table, 'bar'))
+---------+---------+----------+
| 'type' | 'count' | 'errors' |
+=========+=========+==========+
| 'float' | 3 | 1 |
+---------+---------+----------+
| 'int' | 2 | 2 |
+---------+---------+----------+
The field argument can be a field name or index (starting from zero).
Count the number of str or unicode values under the given fields that can be parsed as ints, floats or via custom parser functions. Return a pair of Counter objects, the first mapping parser names to the number of strings successfully parsed, the second mapping parser names to the number of errors. E.g.:
>>> from petl import parsecounter
>>> table = [['foo', 'bar', 'baz'],
... ['A', 'aaa', 2],
... ['B', u'2', '3.4'],
... [u'B', u'3', u'7.8', True],
... ['D', '3.7', 9.0],
... ['E', 42]]
>>> counter, errors = parsecounter(table, 'bar')
>>> counter
Counter({'float': 3, 'int': 2})
>>> errors
Counter({'int': 2, 'float': 1})
The field argument can be a field name or index (starting from zero).
Return a function to parse strings as datetime.date objects using a given format. E.g.:
>>> from petl import dateparser
>>> isodate = dateparser('%Y-%m-%d')
>>> isodate('2002-12-25')
datetime.date(2002, 12, 25)
>>> isodate('2002-02-30')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 1032, in parser
return parser
File "/usr/lib/python2.7/_strptime.py", line 440, in _strptime
datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month
Can be used with parsecounts(), e.g.:
>>> from petl import look, parsecounts, dateparser
>>> table = [['when', 'who'],
... ['2002-12-25', 'Alex'],
... ['2004-09-12', 'Gloria'],
... ['2002-13-25', 'Marty'],
... ['2002-02-30', 'Melman']]
>>> parsers={'date': dateparser('%Y-%m-%d')}
>>> look(parsecounts(table, 'when', parsers))
+--------+---------+----------+
| 'type' | 'count' | 'errors' |
+========+=========+==========+
| 'date' | 2 | 2 |
+--------+---------+----------+
Changed in version 0.6.
Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.
Return a function to parse strings as datetime.time objects using a given format. E.g.:
>>> from petl import timeparser
>>> isotime = timeparser('%H:%M:%S')
>>> isotime('00:00:00')
datetime.time(0, 0)
>>> isotime('13:00:00')
datetime.time(13, 0)
>>> isotime('12:00:99')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 1046, in parser
File "/usr/lib/python2.7/_strptime.py", line 328, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: 9
>>> isotime('25:00:00')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 1046, in parser
File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '25:00:00' does not match format '%H:%M:%S'
Can be used with parsecounts(), e.g.:
>>> from petl import look, parsecounts, timeparser
>>> table = [['when', 'who'],
... ['00:00:00', 'Alex'],
... ['12:02:45', 'Gloria'],
... ['25:01:01', 'Marty'],
... ['09:70:00', 'Melman']]
>>> parsers={'time': timeparser('%H:%M:%S')}
>>> look(parsecounts(table, 'when', parsers))
+--------+---------+----------+
| 'type' | 'count' | 'errors' |
+========+=========+==========+
| 'time' | 2 | 2 |
+--------+---------+----------+
Changed in version 0.6.
Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.
Return a function to parse strings as datetime.datetime objects using a given format. E.g.:
>>> from petl import datetimeparser
>>> isodatetime = datetimeparser('%Y-%m-%dT%H:%M:%S')
>>> isodatetime('2002-12-25T00:00:00')
datetime.datetime(2002, 12, 25, 0, 0)
>>> isodatetime('2002-12-25T00:00:99')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 1018, in parser
return datetime.strptime(value.strip(), format)
File "/usr/lib/python2.7/_strptime.py", line 328, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: 9
Can be used with parsecounts(), e.g.:
>>> from petl import look, parsecounts, datetimeparser
>>> table = [['when', 'who'],
... ['2002-12-25T00:00:00', 'Alex'],
... ['2004-09-12T01:10:11', 'Gloria'],
... ['2002-13-25T00:00:00', 'Marty'],
... ['2002-02-30T07:09:00', 'Melman']]
>>> parsers={'datetime': datetimeparser('%Y-%m-%dT%H:%M:%S')}
>>> look(parsecounts(table, 'when', parsers))
+------------+---------+----------+
| 'type' | 'count' | 'errors' |
+============+=========+==========+
| 'datetime' | 2 | 2 |
+------------+---------+----------+
Changed in version 0.6.
Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.
Return a function to parse strings as bool objects using a given set of string representations for True and False. E.g.:
>>> from petl import boolparser
>>> mybool = boolparser(true_strings=['yes', 'y'], false_strings=['no', 'n'])
>>> mybool('y')
True
>>> mybool('Y')
True
>>> mybool('yes')
True
>>> mybool('No')
False
>>> mybool('nO')
False
>>> mybool('true')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 1175, in parser
raise ValueError('value is not one of recognised boolean strings: %r' % value)
ValueError: value is not one of recognised boolean strings: 'true'
>>> mybool('foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 1175, in parser
raise ValueError('value is not one of recognised boolean strings: %r' % value)
ValueError: value is not one of recognised boolean strings: 'foo'
Can be used with parsecounts(), e.g.:
>>> from petl import look, parsecounts, boolparser
>>> table = [['who', 'vote'],
... ['Alex', 'yes'],
... ['Gloria', 'N'],
... ['Marty', 'hmmm'],
... ['Melman', 'nope']]
>>> mybool = boolparser(true_strings=['yes', 'y'], false_strings=['no', 'n'])
>>> parsers = {'bool': mybool}
>>> look(parsecounts(table, 'vote', parsers))
+--------+---------+----------+
| 'type' | 'count' | 'errors' |
+========+=========+==========+
| 'bool' | 2 | 2 |
+--------+---------+----------+
Changed in version 0.6.
Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.
Attempt to parse the value as a number, trying int(), long(), float() and complex() in that order. If all fail, return the value as-is.
New in version 0.4.
Changed in version 0.7: Set strict=True to get an exception if parsing fails.
Load a dictionary with data from the given table. E.g.:
>>> from petl import lookup
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookup(table, 'foo', 'bar')
>>> lkp['a']
[1]
>>> lkp['b']
[2, 3]
If no valuespec argument is given, defaults to the whole row (as a tuple), e.g.:
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookup(table, 'foo')
>>> lkp['a']
[('a', 1)]
>>> lkp['b']
[('b', 2), ('b', 3)]
Compound keys are supported, e.g.:
>>> t2 = [['foo', 'bar', 'baz'],
... ['a', 1, True],
... ['b', 2, False],
... ['b', 3, True],
... ['b', 3, False]]
>>> lkp = lookup(t2, ('foo', 'bar'), 'baz')
>>> lkp[('a', 1)]
[True]
>>> lkp[('b', 2)]
[False]
>>> lkp[('b', 3)]
[True, False]
Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:
>>> import shelve
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = shelve.open('mylookup.dat')
>>> lkp = lookup(table, 'foo', 'bar', lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('mylookup.dat')
>>> lkp['a']
[1]
>>> lkp['b']
[2, 3]
Load a dictionary with data from the given table, assuming there is at most one value for each key. E.g.:
>>> from petl import lookupone
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = lookupone(table, 'foo', 'bar')
>>> lkp['a']
1
>>> lkp['b']
2
>>> lkp['c']
2
If the specified key is not unique, will raise DuplicateKeyError, e.g.:
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookupone(table, 'foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 451, in lookupone
petl.util.DuplicateKeyError
Unique checks can be overridden by providing strict=False, in which case the last value wins, e.g.:
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookupone(table, 'foo', 'bar', strict=False)
>>> lkp['a']
1
>>> lkp['b']
3
Compound keys are supported, e.g.:
>>> t2 = [['foo', 'bar', 'baz'],
... ['a', 1, True],
... ['b', 2, False],
... ['b', 3, True]]
>>> lkp = lookupone(t2, ('foo', 'bar'), 'baz')
>>> lkp[('a', 1)]
True
>>> lkp[('b', 2)]
False
>>> lkp[('b', 3)]
True
Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:
>>> from petl import lookupone
>>> import shelve
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = shelve.open('mylookupone.dat')
>>> lkp = lookupone(table, 'foo', 'bar', dictionary=lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('mylookupone.dat')
>>> lkp['a']
1
>>> lkp['b']
2
>>> lkp['c']
2
Load a dictionary with data from the given table, mapping to records. E.g.:
>>> from petl import recordlookup
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = recordlookup(table, 'foo')
>>> lkp['a']
[{'foo': 'a', 'bar': 1}]
>>> lkp['b']
[{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}]
Compound keys are supported, e.g.:
>>> t2 = [['foo', 'bar', 'baz'],
... ['a', 1, True],
... ['b', 2, False],
... ['b', 3, True],
... ['b', 3, False]]
>>> lkp = recordlookup(t2, ('foo', 'bar'))
>>> lkp[('a', 1)]
[{'baz': True, 'foo': 'a', 'bar': 1}]
>>> lkp[('b', 2)]
[{'baz': False, 'foo': 'b', 'bar': 2}]
>>> lkp[('b', 3)]
[{'baz': True, 'foo': 'b', 'bar': 3}, {'baz': False, 'foo': 'b', 'bar': 3}]
Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:
>>> import shelve
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = shelve.open('myrecordlookup.dat')
>>> lkp = recordlookup(table, 'foo', dictionary=lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('myrecordlookup.dat')
>>> lkp['a']
[{'foo': 'a', 'bar': 1}]
>>> lkp['b']
[{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}]
Load a dictionary with data from the given table, mapping to records, assuming there is at most one record for each key. E.g.:
>>> from petl import recordlookupone
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = recordlookupone(table, 'foo')
>>> lkp['a']
{'foo': 'a', 'bar': 1}
>>> lkp['b']
{'foo': 'b', 'bar': 2}
>>> lkp['c']
{'foo': 'c', 'bar': 2}
If the specified key is not unique, will raise DuplicateKeyError, e.g.:
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = recordlookupone(table, 'foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "petl/util.py", line 451, in lookupone
petl.util.DuplicateKeyError
Unique checks can be overridden by providing strict=False, in which case the last record wins, e.g.:
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = recordlookupone(table, 'foo', strict=False)
>>> lkp['a']
{'foo': 'a', 'bar': 1}
>>> lkp['b']
{'foo': 'b', 'bar': 3}
Compound keys are supported, e.g.:
>>> t2 = [['foo', 'bar', 'baz'],
... ['a', 1, True],
... ['b', 2, False],
... ['b', 3, True]]
>>> lkp = recordlookupone(t2, ('foo', 'bar'), strict=False)
>>> lkp[('a', 1)]
{'baz': True, 'foo': 'a', 'bar': 1}
>>> lkp[('b', 2)]
{'baz': False, 'foo': 'b', 'bar': 2}
>>> lkp[('b', 3)]
{'baz': True, 'foo': 'b', 'bar': 3}
Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:
>>> import shelve
>>> lkp = shelve.open('myrecordlookupone.dat')
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = recordlookupone(table, 'foo', dictionary=lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('myrecordlookupone.dat')
>>> lkp['a']
{'foo': 'a', 'bar': 1}
>>> lkp['b']
{'foo': 'b', 'bar': 2}
>>> lkp['c']
{'foo': 'c', 'bar': 2}
Compute a crude checksum of the file by hashing the file’s absolute path name, the file size, and the file’s time of last modification. N.B., on some systems this will give a 1s resolution, i.e., any changes to a file within the same second that preserve the file size will not change the result.
Compute the Adler 32 checksum of the file at the given location. Returns the checksum as an integer, use hex(result) to view as hexadecimal.
Compute the CRC32 checksum of the file at the given location. Returns the checksum as an integer, use hex(result) to view as hexadecimal.
Construct a function operating on a record (i.e., a dictionary representation of a data row, indexed by field name).
The expression string is converted into a lambda function by prepending the string with 'lambda rec: ', then replacing anything enclosed in curly braces (e.g., "{foo}") with a lookup on the record (e.g., "rec['foo']"), then finally calling eval().
So, e.g., the expression string "{foo} * {bar}" is converted to the function lambda rec: rec['foo'] * rec['bar']
Return a function to join sequences using s as the separator.
Construct a table with random numerical data. Use numflds and numrows to specify the number of fields and rows respectively. Set wait to a float greater than zero to simulate a delay on each row generation (number of seconds per row). E.g.:
>>> from petl import randomtable, look
>>> t = randomtable(5, 10000)
>>> look(t)
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 'f0' | 'f1' | 'f2' | 'f3' | 'f4' |
+=====================+=====================+=====================+======================+======================+
| 0.37981479583619415 | 0.5651754962690851 | 0.5219839418441516 | 0.400507081757018 | 0.18772722969580335 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.8523718373108918 | 0.9728988775985702 | 0.539819811070272 | 0.5253127991162814 | 0.032332586052070345 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.15767415808765595 | 0.8723372406647985 | 0.8116271113050197 | 0.19606663402788693 | 0.02917384287810021 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.29027126477145737 | 0.9458013821235983 | 0.0558711583090582 | 0.8388382491420909 | 0.533855533396786 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.7299727877963395 | 0.7293822340944851 | 0.953624640847381 | 0.7161554959575555 | 0.8681001821667421 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.7057077618876934 | 0.5222733323906424 | 0.26527912571554013 | 0.41069309093677264 | 0.7062831671289698 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.9447075997744453 | 0.3980291877822444 | 0.5748113148854611 | 0.037655670603881974 | 0.30826709590498524 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.21559911346698513 | 0.8353039675591192 | 0.5558847892537019 | 0.8561403358605812 | 0.01109608253313421 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.27334411287843097 | 0.10064946027523636 | 0.7476185996637322 | 0.26201984851765325 | 0.6303996377010502 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.8348722928576766 | 0.40319578510057763 | 0.3658094978577834 | 0.9829576880714145 | 0.6170025401631835 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
Note that the data are generated on the fly and are not stored in memory, so this function can be used to simulate very large tables.
New in version 0.6.
See also dummytable().
Construct a table with dummy data. Use numrows to specify the number of rows. Set wait to a float greater than zero to simulate a delay on each row generation (number of seconds per row). E.g.:
>>> from petl import dummytable, look
>>> t1 = dummytable(10000)
>>> look(t1)
+-------+-----------+----------------------+
| 'foo' | 'bar' | 'baz' |
+=======+===========+======================+
| 98 | 'oranges' | 0.017443519200384117 |
+-------+-----------+----------------------+
| 85 | 'pears' | 0.6126183086894914 |
+-------+-----------+----------------------+
| 43 | 'apples' | 0.8354915052285888 |
+-------+-----------+----------------------+
| 32 | 'pears' | 0.9612740566307508 |
+-------+-----------+----------------------+
| 35 | 'bananas' | 0.4845179128370132 |
+-------+-----------+----------------------+
| 16 | 'pears' | 0.150174888085586 |
+-------+-----------+----------------------+
| 98 | 'bananas' | 0.22592589109877748 |
+-------+-----------+----------------------+
| 82 | 'bananas' | 0.4887849296756226 |
+-------+-----------+----------------------+
| 75 | 'apples' | 0.8414305202212253 |
+-------+-----------+----------------------+
| 78 | 'bananas' | 0.025845900016858714 |
+-------+-----------+----------------------+
Note that the data are generated on the fly and are not stored in memory, so this function can be used to simulate very large tables.
Data generation functions can be specified via the fields keyword argument, or set on the table via the suffix notation, e.g.:
>>> import random
>>> from functools import partial
>>> t2 = dummytable(10000, fields=[('foo', random.random), ('bar', partial(random.randint, 0, 500))])
>>> t2['baz'] = partial(random.choice, ['chocolate', 'strawberry', 'vanilla'])
>>> look(t2)
+---------------------+-------+--------------+
| 'foo' | 'bar' | 'baz' |
+=====================+=======+==============+
| 0.04595169186388326 | 370 | 'strawberry' |
+---------------------+-------+--------------+
| 0.29252999472988905 | 90 | 'chocolate' |
+---------------------+-------+--------------+
| 0.7939324498894116 | 146 | 'chocolate' |
+---------------------+-------+--------------+
| 0.4964898678468417 | 123 | 'chocolate' |
+---------------------+-------+--------------+
| 0.26250784199548494 | 327 | 'strawberry' |
+---------------------+-------+--------------+
| 0.748470693146964 | 275 | 'strawberry' |
+---------------------+-------+--------------+
| 0.8995553034254133 | 151 | 'strawberry' |
+---------------------+-------+--------------+
| 0.26331484411715367 | 211 | 'chocolate' |
+---------------------+-------+--------------+
| 0.4740252948218193 | 364 | 'vanilla' |
+---------------------+-------+--------------+
| 0.166428545780258 | 59 | 'vanilla' |
+---------------------+-------+--------------+
Changed in version 0.6.
Now supports different field types, e.g., non-numeric. Previous functionality is available as randomtable().
Return the difference between the headers of the two tables as a pair of sets. E.g.:
>>> from petl import diffheaders
>>> table1 = [['foo', 'bar', 'baz'],
... ['a', 1, .3]]
>>> table2 = [['baz', 'bar', 'quux'],
... ['a', 1, .3]]
>>> add, sub = diffheaders(table1, table2)
>>> add
set(['quux'])
>>> sub
set(['foo'])
New in version 0.6.
Return the difference between the values under the given field in the two tables, e.g.:
>>> from petl import diffvalues
>>> table1 = [['foo', 'bar'],
... ['a', 1],
... ['b', 3]]
>>> table2 = [['bar', 'foo'],
... [1, 'a'],
... [3, 'c']]
>>> add, sub = diffvalues(table1, table2, 'foo')
>>> add
set(['c'])
>>> sub
set(['b'])
New in version 0.6.
Return a single iterator over the given iterables, sorted by the given key function, assuming the input iterables are already sorted by the same function. (I.e., the merge part of a general merge sort.) Uses heapq.merge() for the underlying implementation. See also shortlistmergesorted().
New in version 0.9.
Return a single iterator over the given iterables, sorted by the given key function, assuming the input iterables are already sorted by the same function. (I.e., the merge part of a general merge sort.) Uses min() (or max() if reverse=True) for the underlying implementation. See also heapqmergesorted().
New in version 0.9.
Report progress on rows passing through. E.g.:
>>> from petl import dummytable, progress, tocsv
>>> d = dummytable(100500)
>>> p = progress(d, 10000)
>>> tocsv(p, 'output.csv')
10000 rows in 0.57s (17574 rows/second); batch in 0.57s (17574 rows/second)
20000 rows in 1.13s (17723 rows/second); batch in 0.56s (17876 rows/second)
30000 rows in 1.69s (17732 rows/second); batch in 0.56s (17749 rows/second)
40000 rows in 2.27s (17652 rows/second); batch in 0.57s (17418 rows/second)
50000 rows in 2.83s (17679 rows/second); batch in 0.56s (17784 rows/second)
60000 rows in 3.39s (17694 rows/second); batch in 0.56s (17769 rows/second)
70000 rows in 3.96s (17671 rows/second); batch in 0.57s (17534 rows/second)
80000 rows in 4.53s (17677 rows/second); batch in 0.56s (17720 rows/second)
90000 rows in 5.09s (17681 rows/second); batch in 0.56s (17715 rows/second)
100000 rows in 5.66s (17675 rows/second); batch in 0.57s (17625 rows/second)
100500 rows in 5.69s (17674 rows/second)
See also clock().
New in version 0.10.
Time how long is spent retrieving rows from the wrapped container. Enables diagnosis of which steps in a pipeline are taking the most time. E.g.:
>>> from petl import dummytable, clock, convert, progress, tocsv
>>> t1 = dummytable(100000)
>>> c1 = clock(t1)
>>> t2 = convert(c1, 'foo', lambda v: v**2)
>>> c2 = clock(t2)
>>> p = progress(c2, 10000)
>>> tocsv(p, 'dummy.csv')
10000 rows in 1.17s (8559 rows/second); batch in 1.17s (8559 rows/second)
20000 rows in 2.34s (8548 rows/second); batch in 1.17s (8537 rows/second)
30000 rows in 3.51s (8547 rows/second); batch in 1.17s (8546 rows/second)
40000 rows in 4.68s (8541 rows/second); batch in 1.17s (8522 rows/second)
50000 rows in 5.89s (8483 rows/second); batch in 1.21s (8261 rows/second)
60000 rows in 7.30s (8221 rows/second); batch in 1.40s (7121 rows/second)
70000 rows in 8.59s (8144 rows/second); batch in 1.30s (7711 rows/second)
80000 rows in 9.78s (8182 rows/second); batch in 1.18s (8459 rows/second)
90000 rows in 10.98s (8193 rows/second); batch in 1.21s (8279 rows/second)
100000 rows in 12.30s (8132 rows/second); batch in 1.31s (7619 rows/second)
100000 rows in 12.30s (8132 rows/second)
>>> # time consumed retrieving rows from t1
... c1.time
5.4099999999999895
>>> # time consumed retrieving rows from t2
... c2.time
8.740000000000006
>>> # actual time consumed by the convert step
... c2.time - c1.time
3.330000000000016
See also progress().
New in version 0.10.
Convenient adapter for itertools.groupby(). E.g.:
>>> from petl import rowgroupby, look
>>> look(table)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a' | 1 | True |
+-------+-------+-------+
| 'b' | 3 | True |
+-------+-------+-------+
| 'b' | 2 | |
+-------+-------+-------+
>>> # group entire rows
... for key, group in rowgroupby(table, 'foo'):
... print key, list(group)
...
a [('a', 1, True)]
b [('b', 3, True), ('b', 2)]
>>> # group specific values
... for key, group in rowgroupby(table, 'foo', 'bar'):
... print key, list(group)
...
a [1]
b [3, 2]
N.B., assumes the input table is already sorted by the given key.
New in version 0.10.
Construct a function to return the nth word in a string. E.g.:
>>> from petl import nthword
>>> s = 'foo bar'
>>> f = nthword(0)
>>> f(s)
'foo'
>>> g = nthword(1)
>>> g(s)
'bar'
New in version 0.10.