querycsv.py is a Python module and program that allows you to execute SQL code against data contained in one or more comma-separated-value (CSV) files. The output of the SQL query will be displayed on the console by default, but may be saved in a new CSV file.
Actions that can be performed with the querycsv program include:
- Execute a single SQL statement, specified on the command line, against the specified data.
- Execute multiple SQL statements, read from a script file, against the specified data.
- Read data either from one or more CSV files or from a Sqlite data file.
- Create a new CSV file containing selected, summarized, data.
- Create a Sqlite data file from one or more CSV files. The Sqlite file can then be used either by querycsv or by other programs, such as the Python Sqliteplus shell.
Syntax and Options
- The first line of each input (CSV) file must contain column names
- Table names used in the SQL should match the input CSV file names, without either the leading path or the trailing filename extension.
- When multiple input files (or a sqlite file with multiple tables) are used, SQL JOIN clauses can be used to combine the data.
- When a SQL script file is used instead of a single SQL commmand on the command line, only the output of the last SQL command will be displayed.
- Output to the console (the default) is formatted for readability. Output to a disk file is formatted as CSV, with commas delimiting columns and double quotes around strings.
- The primary intended purpose of a command-line SQL statement is to execute a SELECT or UNION statement against the data, and UPDATE and INSERT statements do not have any effect on input CSV file(s). An effect equivalent to an UPDATE statement can be achieved with SELECT statements, however, with output directed to a new CSV file. To perform an INSERT opertation it is necessary to either save the data in a Sqlite file or to use a script file with separate INSERT and SELECT statements.
- The SQL language features that can be used with querycsv are those supported by the Sqlite language.
Given an input data file named testdata.csv that looks like this:
following are several examples of how querycsv could be used.
Run a simple SELECT statement to extract and display data
will produce output like the following.
Run a simple SELECT statement to extract data into a new CSV file
will create a new CSV file named "maxconc.csv" with the following contents.
Execute multiple SQL statements using a script file
To view the number of samples for which each analyte has been measured in the given data file, the following SQL commands could be used.
If these commands were saved in a text file named "sampcount.sql", the the following code
would produce output like the following.
Note that this is a contrived example, as the same result could be obtained with a single command as follows.
Script files can be useful for solving more complex querying problems, however. Note that only the output of the last SQL command in a script file is displayed, so that if input is taken from a Sqlite file, and a view is created (as in the example script above), then a command to drop the view should not be the last command in the script file.
The end of each SQL command in the script file is identified by a line that ends with a semicolon. Comment lines are distinguished by two dashes ("--") at the beginning of a line. No more sophisticated parsing or interpretation of the SQL commands is done, so comments and SQL commands should not be placed on the same line, nor should two different SQL commands be put on the same line.
Create and use a Sqlite file
The "-f" and "-k" options can be used to create a Sqlite file, as follows.
The SQL command used in the previous command line may or may not produce useful output, but it must be present.
The Sqlite file that was created can then be used as follows.
Note that the name(s) of the table(s) in the Sqlite file must be known; they will not necessarily match the name of the Sqlite file.
Copyright and License
Copyright (c) 2008, R.Dreas Nielsen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.