Drop columns from a delimited text file
dropcols.py

dropcols.py is a Python module and program that removes (or conversely, extracts) selected columns from a delimited text file, such as a CSV file. It is analogous to the *nix "cut" program, except that it works on CSV files and allows columns to be selected by name (regular expressions) in addition to by number. Either the columns to keep, or the columns to remove, or both, can be specified.

Syntax and Options

dropcols.py [options] <input file> Arguments: input file The name of the input file from which to read data. This must be a comma-separated-value (csv) text file. The first line of the file must contain column names. Options: -d <column_name_regex1> [column_name_regex2 [...]] Regular expression(s) to match column names to drop. -k <column_name1_regex1> [column_name_regex2 [...]] Regular expression(s) to match column names to keep. -s Show the names of the columns that will be kept. -h,--help Print this help and exit. -v,--version Print the version number and exit.

Usage Notes

Examples

To keep columns 2, 4, 5, and 6, plus a column headed "Status", any of the following expressions can be used with the 'keep' option.

-k 2 4 5 6 Status
-k 2 -k 4 -k 5 -k 6 -k Status
-k 2,4-6 Status
-k Status 4-6,2

The range specification can even be reversed.

-k 6-4,2 Status

Copyright and License

Copyright (c) 2007-2011, R.Dreas Nielsen

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.