Crosstabulate Data in a Text File
xtab.py

xtab.py is a Python module and command-line program that rearranges data from a normalized format to a crosstabulated format. It takes data in this form:

StationDateValue
WQ-012006-05-234.5
WQ-022006-05-233.7
WQ-032006-05-236.8
WQ-012006-06-159.7
WQ-022006-05-155.1
WQ-032006-06-157.2
WQ-012006-07-1910
WQ-022006-07-196.1
WQ-032006-07-198.8

and rearranges it into this form:

Station2006-05-232006-06-152006-07-19
WQ-014.53.76.8
WQ-029.75.17.2
WQ-03106.18.8

Input and output are both text (CSV) files.

Capabilities

You can use the xtab program to:

Required and Optional Arguments

Required Arguments

-i <filename> The name of the input file from which to read data. This must be a text file, with data in a normalized format. The first line of the file must contain column names. -o <filename> The name of the output file to create. The output file will be created as a .csv file. Required argument. -r <column_name1> [column_name2 [...]] One or more column names to use as row headers (space delimited). Unique values of these columns will appear at the beginning of every output line. -c <column_name1> [column_name2 [...]] One or more column names to use as column headers in the output (space delimited). A crosstab column (or columns) will be created for every unique combination of values of these fields in the input. -v <column_name1> [column_name2 [...]] One or more column names with values to be used to fill the cells of the cross-table. If n columns names are specified, then there will be n columns in the output table for each of the column headers corresponding to values of the -c argument. The column names specified with the -v argument will be appended to the output column headers created from values of the -c argument. There should be only one value of the -v column(s) for each combination of the -r and -c columns; if there is more than one, a warning will be printed and only the first value will appear in the output. (That is, values are not combined in any way when there are multiple values for each output cell.)

Optional Arguments

-d[1|2|3|4] Controls the format of column headers. The four alternatives are: -d1 or no option specified One row of column headers, with elements joined by underscores to facilitate parsing by other programs. -d or -d2 Two rows of column headers. The first row contains values of the columns specified by the -c argument, and the second row contains the column names specified by the -v argument. -d3 One header row for each of the values of the columns specified by the -c argument, plus one row with the column names specified by the -v argument. -d4 Like -d3, but the values of the columns specified by the -c argument are labeled with (preceded by) the column names. -f Use a temporary (sqlite) file instead of memory for intermediate storage. -n Use the specified default string in the output wherever an empty or null value would otherwise appear. -k Keep (i.e., do not delete) the sqlite file. Only useful with the "-f" option. Unless the "-t" option is also used, the table name will be "src". -t <tablename> Name to use for the table in the intermediate sqlite database. Only useful with the "-f" and "-k" options. -e [filename] Log all error messages, to a file if the filename is specified or to the console if the filename is not specified. -q <filename> Log the sequence of SQL commands used to extract data from the input file to write the output file, including the result of each command. -h Print this help and exit.

Notes

Copyright and License

Copyright (c) 2008, 2011 R.Dreas Nielsen

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.