Contents
execsql.py is a Python program that applies a SQL script stored in a text file to a PostgreSQL, MS-Access, SQLite, MS-SQL-Server, MySQL, MariaDB, or Firebird database, or an ODBC DSN. execsql.py also supports a set of special commands (metacommands) that can import and export data, copy data between databases, and conditionally execute SQL statements and metacommands. These metacommands make up a control language that works the same across all supported database management systems (DBMSs). The metacommands are embedded in SQL comments, so they will be ignored by other script processors (e.g., psql for Postgres and sqlcmd for SQL Server). The metacommands make up a toolbox that can be used to create both automated and interactive data processing applications; some of these uses are illustrated in the examples.
Capabilities
You can use execsql to:
- Import data from text files or spreadsheets into a database.
- Copy data between different databases, even databases using different DBMSs.
- View data on the console or in a GUI dialog window.
- Export tables and views to text files, ISO-standard OpenDocument spreadsheets, HTML, JSON, LaTeΧ documents, or to eight other tabular formats (see Example 8).
- Export data using template processors to produce non-tabular output with customized format and contents.
- Conditionally execute different SQL commands and metacommands based on the DBMS in use, the database in use, data values, user input, and other conditions. Conditional execution can be used with the INCLUDE and EXECUTE SCRIPT metacommands to implement loops (see Example 6).
- Use simple dynamically-created data entry forms to get user input.
- Write messages to the console or to a file during the processing of a SQL script. These messages can be used to display the progress of the script or create a custom log of the operations that have been carried out or results obtained. Status messages and data exported in text format can be combined in a single text file. Data tables can be exported in a text format that is compatible with Markdown pipe tables, so that script output can be converted into a variety of document formats (see Example 8 and Example 11).
- Write more modular and maintainable SQL code by factoring repeated code out into separate scripts, parameterizing the code using substitution variables, and using the INCLUDE metacommand to merge the modules into a single script (see Example 8).
- Standardize the SQL scripting language used for different types of database management systems.
- Merge multiple elements of a workflow—e.g., data loading, summarization, and reporting—into a single script for better coupling of related steps and more secure maintenance.
- Use "CREATE QUERY..." and "CREATE TEMPORARY QUERY..." statements in Access, which are not natively supported (see Example 1).
execsql is inherently a command-line program that can operate in a completely non-interactive mode (except for password prompts). Therefore, it is suitable for incorporation into a toolchain controlled by a shell script (on Linux), batch file (on Windows), or other system-level scripting application. When used in this mode, the only interactive elements will be password prompts; passwords are not accepted on the command line or as arguments to the CONNECT metacommand. However, several metacommands can be used to generate interactive prompts and data displays, so execsql scripts can be written to provide some user interactivity.
In addition, execsql automatically maintains a log that documents key information about each run of the program, including the databases that are used, the scripts that are run, and the user's choices in response to interactive prompts. Together, the script and the log provide documentation of all actions carried out that may have altered data.
Syntax and Options
If the database type and connection information is specified in a configuration file, then the database type option and the server and database name can be omitted from the command line. The absolute minimum information that must be specified on the command line is the name of the script file to run.
Following are additional details on some of the command-line options:
- -a
- This option should be followed by text that is to be assigned to a substitution variable. Substitution variables can be defined on the command line to provide data or control parameters to a script. The "-a" option can be used repeatedly to define multiple substitution variables. The value provided with each instance of the "-a" option should be a replacement string. execsql will automatically assign the substitution variable names. The substitution variable names will be "$ARG_1", "$ARG_2", etc., for as many variables are defined on the command line. Use of the "-a" option is illustrated in Example 9. Command-line substitution variable assignments are logged.
- -e, -f, -g, -i
- These options should each be followed by the name of a character encoding. Valid names for character encodings can be displayed using the "-y" option.
- -p
- A port number should be provided if the DBMS is using a port
different from the default. The default port numbers are:
- Postgres: 5432
- SQL Server: 1433
- MySQL: 3306
- Firebird: 3050
- -u
- The name of the database user should be provided with this option for password-protected databases; execsql will prompt for a password if a user name is provided, unless the "-w" option is also specified.
- -v
- This option should be followed by an integer indicating
the level of GUI interaction that execsql
should use. The values allowed are:
- 0: Use the terminal for all prompts (the default).
- 1: Use a GUI dialog for password prompts and the PAUSE metacommand.
- 2: Additionally, use a GUI dialog for any message to be displayed with the HALT metacommand, and use a GUI dialog to prompt for the initial database to use if no other specifications are provided.
- 3: Additionally, open a GUI console when execsql starts.
- -w
- Ordinarily if a user name is specified (with the "-u" option), execsql will prompt for a password for that user. When this option is used, execsql will not prompt for entry of a password.
Requirements
The execsql program uses third-party Python libraries to communicate with different database and spreadsheet software. These libraries must be installed to use those programs with execsql. Only those libraries that are needed, based on the command line arguments and metacommands, must be installed. The libraries required for each database or spreadsheet application are:
- PosgreSQL: psycopg2.
- Firebird: fdb.
- MySQL or MariaDB: pymysql.
- SQL Server: pydobc.
- MS-Access: pydobc and pywin32.
- DSN data source: pydobc.
- OpenDocument spreadsheets: odfpy.
- Excel spreadsheets (read only): xlrd.
Connections to SQLite databases are made using Python's standard library, so no additional software is needed.
To use the Jinja or Airspeed template processors with the EXPORT metacommand, those software libraries must be installed also.
Configuration Files
In addition to, or as an alternative to, command-line options and arguments, configuration files can be used to specify most of the same information, plus some additional information. Most of the command-line options and arguments can be specified in a configuration file, with the exception of the script name. The script name must always be specified on the command line.
execsql will read information from up to three configuration files in different locations, if they are present. The three locations are:
- The system-wide application data directory. This is /etc on Linux, and %APPDATA% on Windows.
- The user-specific configuration directory. This is a directory named .config under the user's home directory on both Linux and Windows.
- The script directory.
Configuration data is read from these files in the order listed above. Information in later files may augment or replace information in earlier files. Options and arguments specified on the command line will further augment or override information specified in the configuration files.
Configuration files use the INI file format. Section names are case sensitive and must be all in lowercase. Property names are not case sensitive. Property values are read as-is and may or may not be case sensitive, depending on their use. Comments can be included in configuration files; each comment line must start with the "#" character.
The section and property names that may be used in a configuration file are listed below.
Section connect
- db_type
- The type of database. This is equivalent to the "-t" command-line option, and the same list of single-character codes are the only valid property values.
- server
- The database server name. This is equivalent to the second command-line argument for client-server databases.
- db
- The database name. This is equivalent to the third command-line argument for client-server databases
- db_file
- The name of the database file. This is equivalent to the second command-line argument for file-based databases.
- port
- The port number for the client-server database. This is equivalent to the "-p" command-line option.
- username
- The name of the database user, for client-server databases. This is equivalent to the "-u" command-line option.
- access_username
- The name of the database user, for MS-Access databases only. When using MS-Access, a password will be prompted for only if this configuration option is set or the "-u" command-line option is used, regardless of the setting of the username configuration parameter.
- password_prompt
- Indicates whether or not execsql should prompt for the user's password. The property value should be either "Yes" or "No". This is equivalent to the "-w" command-line option.
- new_db
- Indicates whether or not execsql should create a new PostgreSQL or SQLite database to connect to.
Section encoding
- database
- The database encoding to use. This is equivalent to the "-e" command-line option.
- script
- The script encoding to use. This is equivalent to the "-f" command-line option.
- import
- Character encoding for data imported with the IMPORT metacommand. This is equivalent to the "-i" command-line option.
- output
- Character encoding for data exported with the EXPORT metacommand. This is equivalent to the "-h" command-line option.
Section input
- boolean_int
- Whether or not to consider integer values of 0 and 1 as Booleans when scanning data during import or copying. The property value should be either "Yes" or "No". The default value is "Yes". By default, if a data column contains only values of 0 and 1, it will be considered to have a Boolean data type. By setting this value to "No", such a column will be considered to have an integer data type. This is equivalent to the "-b" command-line option.
- boolean_words
- Whether or not to recognize only full words as Booleans. If this value is "No" (the default), then values of "Y", "N", "T", and "F" will be recognized as Booleans. If this value is "Yes", then only "Yes", "No", "True", and "False" will be recognized as Booleans. This setting is independent of the boolean_int setting.
- max_int
- Establishes the maximum value that will be assigned an integer data type when the IMPORT or COPY metacommands create a new data table. Any column with integer values less than or equal to this value (max_int) and greater than or equal to -1 × max_int - 1 will be considered to have an 'integer' type. Any column with values outside this range will be considered to have a 'bigint' type. The default value for max_int is 2147483647. The max_int value can also be altered within a script using the MAX_INT metacommand.
- empty_strings
- Determines whether empty strings in the input are preserved or, alternatively, will be replaced by NULL. The property value should be either "Yes" or "No". The default, "Yes", indicates that empty strings are allowed. A value of "No" will cause all empty strings to be replaced by NULL. There is no command-line option corresponding to this configuration parameter, but the metacommand EMPTY_STRINGS can also be used to change this configuration item.
- scan_lines
- The number of lines of a data file to scan to determine the quoting character and delimiter character used. This is equivalent to the "-s" command-line option.
- import_buffer
- The size of the import buffer, in kilobytes, to use with the IMPORT metacommand. This is equivalent to the "-z" command-line option.
Section output
- log_write_messages
- Specifies whether output of the WRITE metacommand will also be written to execsql's log file. The property value should be either "Yes" or "No". This configuration property can also be controlled within a script with the LOG_WRITE_MESSAGES metacommand.
- make_export_dirs
- The output directories used in an EXPORT metacommand will be automatically created if they do not exist (and the user has permission). The property value should be either "Yes" or "No". This is equivalent to the "-d" command-line option.
- css_file
- The URI of a CSS file to be included in the header of an HTML file created with the EXPORT metacommand. If this is specified, it will replace the CSS styles that execsql would otherwise use.
- css_style
- A set of CSS style specifications to be included in the header of an HTML file created with the EXPORT metacommand. If this is specified, it will replace the CSS styles that execsql would otherwise use. Both css_file and css_style may be specified; if they are, they will be included in the header of the HTML file in that order.
- template_processor
- The name of the template processor that will be used with the EXPORT and EXPORT QUERY metacommands. The only valid values for this property are "jinja" and "airspeed". If this property is not specified, the default template processor will be used.
Section interface
- console_wait_when_done
- Controls the persistence of any console window at the completion of the script. If the property value is set to "Yes" (the default value is "No"), the console window will remain open until explicitly closed by the user. The message "Script complete; close the console window to exit execsql." will be displayed in the status bar. This setting has the same effect as a CONSOLE WAIT metacommand.
- gui_level
- The level of
interaction with the user that should be carried out using GUI dialogs.
The property value must be 0, 1, 2, or 3. The meanings of these
values are:
- 0: Do not use any optional GUI dialogs.
- 1: Use GUI dialogs for password prompts and for the PAUSE metacommand.
- 2: Also use a GUI dialog if a message is included with the HALT metacommand, and prompt for the initial database to use if no database connection parameters are specified in a configuration file or on the command line.
- 3: Additionally, open a GUI console when execsql starts.
Section email
- host
- The SMTP host name to be used to transmit email messages sent using the EMAIL metacommand. A host name must be specified to use the EMAIL metacommand.
- port
- The port number of the SMTP host to use. If this is omitted, port 25 will be used unless either the "use_ssl" or "use_tls" configuration properties is also specified, in which case ports 465 or 587 may be used.
- username
- The name of the user if the SMTP server requires login authentication.
- password
- An unencrypted password to be used if the SMTP server requires login authentication.
- enc_password
- An encrypted
password to be used if the SMTP server required login authentication.
The encrypted version of a password should be as is produced by the
SUB_ENCRYPT metacommand. A suitably
encrypted version of a password can be produced by running the script
-- !x!If both the "password" and "enc_password" configuration properties are used, the "enc_password" property will take precedence and will be used for SMTP authentication. pw "Enter a password to encrypt" -- !x! enc_pw !!pw!! -- !x! "The encrypted password is: !!enc_pw!!"
- use_ssl
- SSL/TLS encryption will be used from the initiation of the connection.
- use_tls
- SSL/TLS encryption will be used after the initial connection is made using unencrypted text.
- email_format
- Specifies whether the message will be sent as plain text or as HTML email. The only valid values for this property are "plain" and "html". If not specified, emails will be sent in plain text.
- message_css
- A set of CSS rules to be applied to HTML email.
Section config
- config_file
- The full name or path to an additional configuration file to be read. If only a path is specified, the name of the configuration file should be execsql.conf. The configuration file specified will be read immediately following the configuration file in which it is named. No configuration file will be read more than once.
Section variables
There are no fixed properties for this section. All property names and their values that are specified in this section will be used to define substitution variables, just as if a series of SUB metacommands had been used at the beginning of the script.
Usage Notes
- If the program is run without any arguments it will print a help message on the terminal, similar to the usage description above.
- Script files can contain single-line comments, which are identified by two dashes ("--") at the start of a line. Script files can also contain multi-line comments, which begin on a line where the first characters are "/*" and end on a line where the last characters are "*/".
- execsql recognizes a SQL statement as consisting of a sequence of non-comment lines that ends with a line ending with a semicolon. A backslash ("\") at the end of a line is treated as a line continuation character. Backslashes do not need to be used for simple SQL statements, but must be used for procedure and function definitions, where there are semicolons within the body of the definition, and a semicolon appears at the end of lines for readability purposes. Backslashes may not be used as continuation characters for metacommands.
- Comments and SQL statements should not be mixed on a single line.
A mixture of comments and SQL statements like this:
select scramble(eggs) -- Use custom aggregate function from refrigerator natural join stove;will result in an unpalatable hash.
- With the exception of the "CREATE TEMPORARY QUERY..." statement when used with MS-Access, the execsql program does not parse or interpret SQL syntax in any way.
- SQL syntax used in the script must conform to that recognized by the DBMS engine in use (e.g., scripts for Access must use SQL compatible with the Jet database engine). Because execsql can connect to several different DBMSs simultaneously, a single script can contain a mixture of different SQL syntaxes. To minimize this variation (and possible mistakes that could result), execsql metacommands provide some common features of DBMS-specific scripting languages (e.g., pgScript and T-SQL), and execsql turns on ANSI-compatible mode for SQL Server and MySQL when it connects to those databases.
- Metacommands can be embedded in SQL
comments to export data and carry out other actions during the course of
the script. These metacommands are identified by the token "!x!"
immediately following the SQL comment characters at the beginning of a
line, i.e.:
-- !x! <metacommand>The special commands that are available are described in the Metacommands section.
- SQL statements are ordinarily automatically committed by execsql. Consequently, database transactions will not work as expected under default conditions. The AUTOCOMMIT and BATCH metacommands provide two different ways to alter execsql's default autocommit behavior. Transactions will work as expected either within a batch or after autocommit has been turned off. One difference between these two approaches is that within transactions inside a batch, changes to data tables are not visible to metacommands such as PROMPT DISPLAY, whereas these data are visible within transactions that follow an AUTOCOMMIT OFF metacommand. Both methods can be used to implement cross-database transactions, but the difference in data visibility affects what tests can be done to decide whether to commit or roll back a transaction.
- If execsql finishes normally, without errors and without being halted either by script conditions or the user, the system exit status will be set to 0 (zero). If an error occurs that causes the script to halt, the exit status will be set to 1. If the user cancels script processing in response to any prompt, the exit status will be set to 2. If the script is halted with either the HALT or HALT DISPLAY metacommands, the system exit status will be set to 2 unless an alternate value is specified as part of the metacommand.
- Scripts for Microsoft Access that use temporary queries will result in those queries being created in the Access database, and then removed, every time the scripts are run. This will lead to a gradual increase in the size of the Access database file. If the script halts unexpectedly because of an error, the temporary queries will remain in the Access database. This may assist in debugging the error, but if the temporary queries are not created conditional on their non-existence, you may have to remove them manually before re-running the script.
- The user name for password-protected Access databases is "Admin" by default (i.e., if no other user name was explicitly specified when the password was applied). To ensure that execsql prompts for a password for password-protected Access databases, a user name must be specified either on the command line with the "-u" option or in a configuration file with the access_username configuration item. When the user name in Access is "Admin", any user name can be provided to execsql.
- With Access databases, an ODBC connection is used for SELECT queries, to allow errors to be caught, and a DAO connection to the Jet engine is used when saved action queries (UPDATE, INSERT, DELETE) are created or modified. Because the Jet engine only flushes its buffers every five seconds, execsql will ensure that at least five seconds have passed between the last use of DAO and the execution of a SELECT statement via ODBC.
- The syntax of the "CREATE TEMPORARY QUERY" DDL supported by
execsql when used with an MS-Access database
is:
CREATE [TEMP[ORARY]] QUERY|VIEW <query_name> AS <sql_command>The "TEMPORARY" specification is optional: if it is included, the query will be deleted after the entire script has been executed, and if it is not, the query will remain defined in the database after the script completes. If a query of the same name is already defined in the Access database when the script runs, the existing query will be deleted before the new one is created—no check is performed to determine whether the new and old queries have the same definition, and no warning is issued by execsql that a query definition has been replaced. The keyword "VIEW" can be used in place of the keyword "QUERY". This alternative provides compatibility with the "CREATE TEMPORARY VIEW" command in PostgreSQL, and minimizes the need to edit any scripts that are intended to be run against both Access and PostgreSQL databases.
- Boolean (Yes/No) columns in Access databases cannot contain NULL values. If you IMPORT boolean data into a column having Access' boolean data type, any NULL values in the input data will be converted to False boolean values. This is a potentially serious data integrity issue. To help avoid this, when the NEW or REPLACEMENT keywords are used with the IMPORT or COPY metacommands, and execsql determines that the input file contains boolean data, execsql will create that column in Access with an integer data type rather than a boolean data type, and when adding data will convert non-integer True values to 1, and False values to 0.
- When a DSN is used as a data source, execsql has no information about the features or SQL syntax used by the underlying DBMS. In the expectation that a DSN connection will most commonly be used for Access databases under 64-bit Windows, a DSN connection will use Access' syntax when issuing a CREATE TABLE statement in response to a COPY or IMPORT metacommand. However, a DSN connection does not (and cannot) use DAO to manage queries in a target Access database, so all data manipulations must be carried out using SQL statements. The EXECUTE metacommand uses the same approach for DSN connections as is used for SQL Server.
SQL Syntax Notes
MS-Access Quirks
The version of SQL that is used by the Jet engine when accessed via DAO or ODBC, and thus that must be used in the script files executed with execsql, is generally equivalent to that used within Access itself, but is not identical, and is also not the same in all respects as standard SQL. There are also differences in the SQL syntax accepted by the DAO and ODBC interfaces. To help avoid inconsistencies and errors, here are a few points to keep in mind when creating SQL scripts for use with Access:
- The Jet engine can fail to correctly parse multi-table JOIN expressions. In these cases you will need to give it some help by parenthesizing parts of the JOIN expression. This means that you have some responsibility for constructing optimized (or at least acceptably good) SQL.
- Not all functions that you can use in Access are available via DAO or ODBC. Sometimes these can be worked around with slightly lengthier code. For example, the 'Nz()' function is not available in an ODBC connection, but it can be replaced with an expression such as 'Iif([Column] is null, 0, [Column])'. The list of ODBC functions that can be used is listed here: https://msdn.microsoft.com/en-us/library/office/ff835353.aspx. When creating (temporary) queries—i.e., when using DAO—the functions available are equivalent to those available in Access' GUI interface. A partial list of the differences between Access and ANSI SQL is here: https://msdn.microsoft.com/en-us/library/bb208890%28v=office.12%29.aspx
- Literal string values in SQL statements should be enclosed in single quotes, not double quotes. Although Access allows double quotes to be used, the ANSI SQL standard and the connector libraries used for execsql require that single quotes be used.
- Square brackets must be used around column names that contain embedded spaces when a temporary query is being used (i.e., DAO is used). At all other times, double quotes will work
- Expressions that should produce a floating-point result ('Double') sometimes do not, with the output being truncated or rounded to an integer. A workaround is to multiply and then divide the expression by the same floating-point number; for example: '1.00000001 * <expression> / 1.00000001'.
- The wildcard character to use with the LIKE expression, in a
'CREATE [TEMPORARY] QUERY' statement, differs under different circumstances:
- "*" must be used in action queries (UPDATE, INSERT, DELETE).
- "*" must be used in simple SELECT queries.
- "%" must be used in subqueries of SELECT queries.
- The BEGIN TRANSACTION statement (and COMMIT, END, and ROLLBACK statements) are not supported as individual SQL statements.
Implicit DROP TABLE Statements
The "REPLACEMENT" keyword for the IMPORT and COPY metacommands allows a previously existing table to be replaced. To accomplish this, execsql issues a "DROP TABLE" statement to the database in use. PostgreSQL, SQLite, MySQL, and MariaDB support a form of the "DROP TABLE" statement that automatically removes all foreign keys to the named table. execsql uses these forms of the "DROP TABLE" statement for these DBMSs, and therefore use of the "REPLACEMENT" keyword always succeeds at removing the named table before trying to create a new table with the same name. SQL Server, MS-Access, and Firebird do not have a form of the "DROP TABLE" statement that automatically removes foreign keys. Therefore, if the "REPLACEMENT" keyword is used with any of these three DBMSs, for a table that has foreign keys into it, that table wil not be dropped, and an error will subsequently occur when execsql issues a "CREATE TABLE" statement to create a new table of the same name. To avoid this, when using any of these three DBMSs, you should include in the script the appropriate SQL commands to remove foreign keys (and possibly even to remove the table) before using the IMPORT or COPY metacommands.
Implicit Commits
By default, execsql immediately commits all SQL statements. The AUTOCOMMIT metacommand can be used to turn off automatic commits, and the BATCH metacommand can be used to delay commits until the end of a batch. IMPORT and COPY are the only metacommands that change data, and they also automatically commit their changes when complete (unless AUTOCOMMIT has been turned off). If a new table is created with either of these metacommands (through the use of the NEW or REPLACEMENT keywords), the CREATE TABLE statement will not be committed separately from the data addition, except when using Firebird. Thus, if an error occurs during addition of the data, the new target table will not exist—except when using Firebird.
When adding a very large amount of data with the IMPORT or COPY metacommands, internal transaction limits may be exceeded for some DBMSs. For example, MS-Access may produce a 'file sharing lock count exceeded' error when large data sets are loaded.
Boolean Data Types
Not all DBMSs have explicit support for a boolean data type. When execsql creates a new table as a result of the NEW or REPLACEMENT keyword in IMPORT and COPY metacommands, it uses the following data type for boolean values in each DBMS:
- Postgres: boolean.
- SQLite: integer. True values are converted to 1, and False values are converted to 0.
- Access: integer. Although Access supports a "bit" data type, bit values are non-nullable, and so to preserve null boolean values, execsql uses the integer type instead. True values are converted to 1, and False values are converted to 0.
- SQL Server: bit.
- MySQL and MariaDB: boolean
- Firebird: integer. True values are converted to 1, and False values are converted to 0.
If boolean values are imported to some other data type in an existing table, the conversion to that data type may or may not be successful.
When scanning input data to determine data types, execsql will consider a column to contain boolean values if it contains only values of 0, 1, '0', '1', 'true', 'false', 't', 'f', 'yes', 'no', 'y', or 'n'. Character matching is case-insensitive.
Schemas, the IMPORT and COPY Metacommands, and Schema-less DBMSs
If a schema name is used with the table specifications for the IMPORT or COPY metacommands, when the command is run against either MS-Access or SQLite, the schema name will be ignored. No error or warning message will be issued. Such irrelevant schema specifications are ignored to reduce the need to customize metacommands for use with different DBMSs.
ANSI Compatibility
When execsql connects to a SQL Server or MySQL database, it automatically configures the DBMS to expect ANSI-compatible SQL, to allow the use of more standards-compliant, and thus consistent, SQL. In particular, for MySQL, note that the double-quote character, rather than the backtick character, must be used to quote table, schema, and column names, and only the apostrophe can be used to quote character data.
Substitution Variables
Substitution variables are words that have been defined to be equivalent to some other text, so that when they are used, those words will be replaced (substituted) by the other text in a SQL statement or metacommand before that statement or metacommand is executed. Substitution variables can be defined using the SUB metacommand, as follows:
The <match_string> is the word (substitution variable) that will be matched, and the <replacement_string> is the text that will be substituted for the matching word. Substitution variables are only recognized in SQL statements and metacommands when the match string is preceded and followed by two exclamation points ("!!"). For example:
Substitution variable names may contain only letters, digits, and the underscore character. Substitutions are processed in the order in which they are defined. Substitution variable definitions can themselves include substitution variables. SQL statements and metacommands may contain nested references to substitution variables, as illustrated in Example 7. Complex expressions using substitution variables can be evaluated using SQL, as illustrated in Example 16.
In addition to user-defined substitution variables, there are three additional kinds of substitution variables that are defined automatically by execsql or by specific metacommands. These are system variables, data variables, and environment variables. System, data, and environment variable names are prefixed with "$", "@", and "&" respectively. Because these prefixes cannot be used when defining substitution variables with the SUB metacommand, system variable, data variable, and environment variable names will not conflict with user-created variable names.
System Variables
Several special substitutions (pairs of matching strings and replacement strings) are automatically defined and maintained by execsql. The names and definitions of these substitution variables are:
- $ARG_x
- The value of a substitution variable that has been assigned on the command line using the "-a" command-line option. The value of <x> must be an integer greater than or equal to 1. See Example 9 for an illustration of the use of "$ARG_x" variables.
- $AUTOCOMMIT_STATE
- A value indicating whether or not execsql will automatically commit each SQL statement as it is executed. This will be either "ON" or "OFF". The autocommit state is database specific, and the value applies only to the database currently in use.
- $CANCEL_HALT_STATE
- The value of the status flag that is set by the CANCEL_HALT metacommand. The value of this variable is always either "ON" or "OFF". A modularlized sub-script can use this variable to access and save (in another substitution variable) the CANCEL_HALT state before changing it, so that the previous state can be restored.
- $COUNTER_x
- An integer value that is automatically incremented every time the counter variable is referenced. As many counter variables as desired can be used. The value of x must be an integer that identifies the counter variable. Counter variable names do not have to be used sequentially. The first time that a counter variable is referenced, it returns the value 1. The RESET COUNTER and RESET COUNTERS metacommands can be used to reset counter variables. See examples 6, 7, 11, and 19 for illustrations of the use of counter variables.
- $CURRENT_ALIAS
- The alias of the database currently in use, as defined by the CONNECT metacommand, or "initial" if no CONNECT metacommand has been used. This value will change if a different database is USEd.
- $CURRENT_DATABASE
- The DBMS type and the name of the current database. This value will change if a different database is USEd.
- $CURRENT_DBMS
- The DBMS type of the database in use. This value may change if a different database is USEd.
- $CURRENT_DIR
- The full path to the current directory. The value will not have a directory separator character (i.e., "/" or "\") at the end.
- $CURRENT_SCRIPT
- The file name of the script from which the current command originated. This value will change if a different script is INCLUDEd. This file name may or may not include a path, depending on how the script file was identified on the command line or in an INCLUDE metacommand
- $CURRENT_TIME
- The date and time at which the current script line is run. See Example 3 for an illustration of its use.
- $DATE_TAG
- The date on which execsql started processing the current script, in the format YYYYMMDD. This is intended to be a convenient short form of the date that can be used to apply sequential version indicators to directory names or file names (e.g., of exported data). See Example 2 for an illustration of its use.
- $DATETIME_TAG
- The date and time at which execsql started processing the current script, in the format YYYYMMDD_hhmm. This is intended to be a convenient short form of the date and time that can be used to apply sequential versions to directory names or file names. See Example 8 for an illustration of its use.
- $DB_NAME
- The name of the database currently in use, as specified on the command line or in a CONNECT metacommand. This will be the database name for server-based databases, and the file name for file-based databases.
- $DB_NEED_PWD
- A string equal to "TRUE" or "FALSE" indicating whether or not a password was required for the database currently in use.
- $DB_SERVER
- The name of the database server for the database currently in use, as specified on the command line or in a CONNECT metacommand. If the database in use is not server-based, the result will be an empty string.
- $DB_USER
- The name of the database user for the database currently in use, as specified on the command line or in a CONNECT metacommand. If the database connection does not require a user name, the result will be an empty string.
- $ERROR_HALT_STATE
- The value of the status flag that is set by the ERROR_HALT metacommand. The value of this variable is always either "ON" or "OFF". A modularlized sub-script can use this variable to access and save (in another substitution variable) the ERROR_HALT state before changing it, so that the previous state can be restored.
- $LAST_ERROR
- The text of the last SQL statement that encountered an error. This value will only be available if the ERROR_HALT OFF metacommand has been used.
- $LAST_ROWCOUNT
- The number of rows that were affected by the last INSERT, UPDATE, or SELECT statement. Note that support for $LAST_ROWCOUNT varies among DBMSs. For example, for SELECT statements, Postgres provides an accurate count, SQLite always returns -1, and Firebird always returns 0.
- $LAST_SQL
- The text of the last SQL statement that ran without error.
- $METACOMMAND_ERROR_HALT_STATE
- The value of the status flag that is set by the METACOMMAND_ERROR_HALT metacommand. The value of this variable is always either "ON" or "OFF".
- $OS
- The name of the operating system. This will be "linux", "windows", "cygwin", "darwin", "os2", "os2emx", "riscos", or "atheos".
- $RANDOM
- A random real number in the semi-open interval [0.0, 1.0).
- $RUN_ID
- The run identifier that is used in execsql's log file.
- $SCRIPT_LINE
- The line number of the current script for the current command.
- $SCRIPT_START_TIME
- The date and time at which execsql started processing the current script. This value never changes within a single run of execsql.
- $STARTING_SCRIPT
- The file name of the script specified on the command line when execsql is run. This value never changes within a single run of execsql. This file name may or may not include a path, depending on how it was specified on the command line.
- $TIMER
- The elapsed time of the script timer. If the TIMER ON command has never been used, this value will be zero. If the timer has been started but not stopped, this value will be the elapsed time since the timer was started. If the timer has been started and stopped, this value will be the elapsed time when the timer was stopped.
- $USER
- The name of the person logged in when the script is started. This is not necessarily the same as the user name used with any database.
- $UUID
- A random 128-bit Universally Unique Identifier in the canonical form of 32 hexadecimal digits.
The system variables can be used for conditional execution of different SQL commands or metacommands, and for custom logging of a script's actions using the WRITE metacommand.
Data Variables
Two metacommands, SELECT_SUB and PROMPT SELECT_SUB, will each create a set of substitution variables that correspond to the data values in a single row of a data table. The column names of the data table, prefixed with "@", will be automatically assigned as the names of these data variables. The prefix of "@" cannot be assigned using SUB or similar metacommands, and so will prevent data variables from overwriting any user-defined substitution variables that may have the same name as a data table column. See Example 8 for an illustration of the use of a data variable. All assignments to data variables are automatically logged.
Environment Variables
The operating system environment variables that are defined when execsql starts will be available as substitution variables prefixed with "&". New environment variables cannot be added by any metacommand.
Metacommands to Assign Substitution Variables
In addition to the SUB metacommand, several other metacommands can be used to define substitution variables based on values in a data table, user input, or a combination of the two. All of the metacommands that can be used to define substitution variables are:
- PROMPT DIRECTORY
- Opens a dialog box and prompts the user to identify an existing directory on the file system. The name of the substitution variable is specified in the metacommand, and the full path to the selected directory will be used as the replacement string.
- PROMPT ENTER_SUB
- Opens a dialog box and prompts the user to interactively enter the text that will be used as a replacement string. The name of the substitution variable is specified in the metacommand.
- PROMPT ENTRY_FORM
- Displays a custom data entry form and assigns each of the values entered to a specified substitution variable.
- PROMPT OPENFILE
- Opens a dialog box and prompts the user to select an existing file. The name of the substitution variable is specified in the metacommand, and the full path to the selected file will be used as a replacement string.
- PROMPT SAVEFILE
- Opens a dialog box and prompts the user to enter the name of a new or existing file; the full path to this file will be used as a replacement string.
- PROMPT SELECT_SUB
- Opens a dialog box, displays a data table or view, and prompts the user to select a row. The data values on the selected row will be assigned to a set of data variables.
- SELECT_SUB
- The data values on the first row of a specified table or view will be assigned to a set of data variables. No prompt is displayed.
- SUB
- Directly assigns a replacement string to a substitution variable.
- SUB_TEMPFILE
- Assigns a temporary file name to the specified substitution variable.
- SUBDATA
- The data value in the first column of the first row of a specified table or view will be assigned to a user-specified substitution variable.
Substitution variables can also be defined in configuration files.
Metacommands
Metacommands
- AUTOCOMMIT
- BEGIN BATCH, END BATCH
- BEGIN SCRIPT, END SCRIPT
- BOOLEAN_INT
- BOOLEAN_WORDS
- CANCEL_HALT
- CONNECT
- CONSOLE
- COPY
- COPY QUERY
- EMPTY_STRINGS
- ERROR_HALT
- EXECUTE
- EXECUTE SCRIPT
- EXPORT
- EXPORT QUERY
- HALT
- HALT DISPLAY
- IF
- alias_defined
- column_exists
- console
- database_name
- dbms
- directory_exists
- equal
- file_exists
- hasrows
- identical
- is_gt
- is_gte
- is_null
- is_zero
- metacommand_error
- newer_date
- newer_file
- sql_error
- sub_defined
- table_exists
- view_exists
- IMPORT
- INCLUDE
- LOG
- LOG_WRITE_MESSAGES
- MAX_INT
- METACOMMAND_ERROR_HALT
- PAUSE
- PROMPT ASK
- PROMPT CONNECT
- PROMPT DIRECTORY
- PROMPT DISPLAY
- PROMPT ENTER_SUB
- PROMPT ENTRY_FORM
- PROMPT OPENFILE
- PROMPT SAVEFILE
- PROMPT SELECT_SUB
- RESET COUNTER
- RESET COUNTERS
- RM_FILE
- RM_SUB
- SELECT_SUB
- SET COUNTER
- SUB
- SUB_DECRYPT
- SUB_ENCYRPT
- SUB_TEMPFILE
- SUBDATA
- SYSTEM_CMD
- TIMER
- USE
- WAIT_UNTIL
- WRITE
- WRITE CREATE_TABLE
The execsql program supports several special commands that allow the following actions to be taken at certain points within the script file:
- Include the contents of another SQL script file.
- Import data from a text file or spreadsheet to a new or existing table.
- Export data to the console or a file in a variety of formats.
- Connect to multiple databases and copy data between them.
- Write text out to the console or to a file.
- Execute an operating system command.
- Stop or pause script processing.
- Display a data table and allow the user to choose to stop script processing or to continue.
- Prompt for the names of files or directories to be used.
- Group SQL statements together to be executed in a single batch.
- Execute a function/procedure/query.
- Execute these actions only if certain conditions are met.
- Control the handling of SQL errors.
Metacommands recognized by execsql are embedded in SQL comments, and are identified by the token "!x!" immediately following the comment characters at the beginning of the line. Each metacommand must be completely on a single line. Metacommand usage is illustrated in several of the examples.
Metacommands can appear anywhere in a SQL script except embedded inside a SQL statement. This restriction prohibits constructions such as:
This will not work because metacommands are not executed at the time that SQL statements are read from the script file, but are run after the script has been parsed into separate SQL statements and metacommands. Instead, SQL statements can be dynamically constructed using substitution variables to modify them at runtime, like this:
The metacommands are described in the following sections. Metacommand names are shown here in all uppercase, but execsql is not case-sensitive when evaluating the metacommands.
AUTOCOMMIT
By default, execsql automatically commits each SQL statement individually. Setting AUTOCOMMIT off will change this behavior. The user is then responsible for explicitly issuing a "COMMIT;" statement to the database to ensure that all preceding SQL statements are executed.
Unlike BATCH metacommands, the SQL statements issued while AUTOCOMMIT is off will not be queued up and automatically run when AUTOCOMMIT is turned back on again. However, any SQL statements that are run after AUTOCOMMIT is turned back on will be automatically committed, and that commit operation will also commit any SQL statements that were issued while AUTOCOMMIT was off, unless a rollback statement was used as the last SQL statement while AUTOCOMMIT was off.
The AUTOCOMMIT metacommand is database-specific, and affects only the database in use when the metacommand is used. This contrasts with the BATCH metacommand, which affects all databases.
The IMPORT and COPY metacommands do not commit data changes while AUTOCOMMIT is off. The SQL statements generated by the IMPORT and COPY metacommands are sent to the database, however. Therefore the AUTOCOMMIT metacommand is recommended when explicit transaction control is to be applied to the IMPORT and COPY metacommands.
BEGIN BATCH and END BATCH
The BATCH commands provide a sort of transaction control at the script level, as an alternative to using the DBMS's own transaction commands. execsql ordinarily executes and commits SQL statements immediately (i.e., as if the database connection is set to autocommit, although execsql actually manages commit and rollback statements directly). The BATCH commands allow you to alter this behavior so that SQL statements are not executed and committed until a batch is completed. This allows execsql to emulate environments that operate in batch mode by default (specifically, sqlcmd). In addition, batches operate across databases: SQL statements directed to several different databases will all be held until the end of the batch, and will then be executed at the end of the batch—or discarded if the batch is rolled back.
BEGIN BATCH marks the beginning of a set of SQL statements to be executed in a single operation. END BATCH marks the end of that set of statements. ROLLBACK BATCH discards any SQL statements that have already been included in the batch, but does not terminate the batch.
The SQL statements within a batch are queued up and sent to the database only at the end of the batch. If SQL statements for several different databases are included within the batch, they will be executed at the end of the batch in the order in which they were specified.
Metacommands are executed regardless of whether or not they appear within a batch. Specifically, the IMPORT and COPY metacommands will send their data to the database immediately. Therefore, if these metacommands are run within a batch, then regardless of the metacommands' position within the batch, the resulting data changes will be made before any SQL statements within the batch are run. Because of the potential for unexpected effects when IMPORT or COPY metacommands are embedded within a batch, this construction is probably best avoided.
Alternatives to using batches to control the execution time of SQL statements are:
- The AUTOCOMMIT metacommand, which provides a different method of integrating IMPORT and COPY metacommands with a sequence of SQL statements
- The IF metacommand, which provides a way of conditionally executing SQL statements and metacommands such as IMPORT and COPY
- The BEGIN/END SCRIPT and EXECUTE SCRIPT metacommands, which allow both SQL statement and metacommands to be grouped together and executed as a group, with AUTOCOMMIT either on or off.
The END BATCH metacommand is equivalent to the "GO" command of SQL Server utilities such as sqlcmd. There is no explicit equivalent to BEGIN BATCH in sqlcmd or other SQL Server utilities. In sqlcmd a new batch is automatically begun at the beginning of the script or immediately after a GO statement. execsqsl only starts a new batch when a BEGIN BATCH statement is encountered.
If the end of the script file is encountered while a batch of statements is being compiled, but there is no END BATCH metacommand, the SQL statements in that incomplete batch will not be executed.
BEGIN SCRIPT and END SCRIPT
The BEGIN SCRIPT and END SCRIPT metacommands define a block of statements (SQL statements and metacommands) that can be subsequently executed (repeatedly, if desired) using the EXECUTE SCRIPT metacommand.
The statements within the BEGIN/END SCRIPT block are not executed within the normal flow of the script in which they appear, and, unlike the BEGIN/END BATCH commands, neither are they executed when the END SCRIPT metacommand is encountered. These statements are executed only when the corresponding script is named in an EXECUTE SCRIPT metacommand.
A BEGIN/END SCRIPT block can be used in ways similar to a separate script file that is included with the INCLUDE metacommand. Both allow the same code to be executed repeatedy, either at different locations in the main script or recursively to perform looping.
The BEGIN SCRIPT and END SCRIPT metacommands are executed when a script file is read, not while the the script is being executed. As a consequence:
- Substitution variables should ordinarily not be used as script names because they will not have been defined yet, unless they were defined in the variables section of a configuration file; and
- The BEGIN/END SCRIPT commands are not ordinarily subject to conditional execution.
However, the BEGIN SCRIPT and END SCRIPT metacommands can be used in a separate script file that is INCLUDEd in the main script. In this case, both of the previous restrictions are eliminated. In addition the EXECUTE SCRIPT metacommand can be included in a conditional statement.
BOOLEAN_INT
Controls whether integer values of 0 and 1 are considered to be Booleans when the IMPORT and COPY metacommands scan data to determine data types to create a new table (i.e, when either the NEW or REPLACEMENT keyword is used with the IMPORT and COPY metacommands.) The argument should be either "Yes" or "No". execsql's default behavior is to consider a column with only integer values of 0 and 1 to have a Boolean data type. By setting this value to "No", such a column will be considered to have an integer data type. This is equivalent to the "-b" command-line option and the "boolean_int" configuration parameter.
BOOLEAN_WORDS
Controls whether execsql will recognize only full words as Booleans when the IMPORT and COPY metacommands scan data to determine data types to create a new table (i.e, when either the NEW or REPLACEMENT keyword is used with the IMPORT and COPY metacommands.). The argument should be either "Yes" or "No". execsql's default behavior is to recognize values of "Y", "N", "T", and "F" as Booleans. By setting BOOLEAN_WORDS to "Yes", then only "Yes", "No", "True", and "False" will be recognized as Booleans.
CANCEL_HALT
When CANCEL_HALT is set to ON, which is the default, if the user presses the "Cancel" button on a dialog (such as is presented by the PROMPT DISPLAY metacommand), execsql will halt script processing. If CANCEL_HALT is set to OFF, then execsql will not halt script processing, and it is the script author's responsibility to ensure that adverse consequences do not result from the lack of a response to the dialog. Example 10 illustrates a condition in which setting CANCEL_HALT to OFF is appropriate.
CONNECT
For PostgreSQL:
For SQLite:
For MS-Access:
For SQL Server:
For MySQL
For MariaDB
For Firebird:
For a DSN:
Establishes a connection to another database. The keyword values are equivalent to arguments and options that can be specified on the command line when execsql is run. The "NEW" keyword, used with PostgreSQL and SQLite, will cause a new database of the given name to be created. There must be no existing database of that name, and (for Postgres) you must have permissions assigned that allow you to create databases.
The CONNECT metacommands for Access and DSN connections are the only ones that allows a password to be specified. If a password is needed for any other database, execsql will display a prompt for the password. An exception has been made for Access because of an actual use case where data had to be extracted from over 11,000 Access files, all with the same password. Rather than embedding the password directly into the SQL script, the PROMPT ENTER_SUB can be used to prompt for the password just once, and the PASSWORD clause of the CONNECT metacommand can then reference the substitution variable that is created by the PROMPT ENTER_SUB metacommand.
The alias name that is specified in this command can be used to refer to this database in the USE and COPY metacommands. Alias names can consist only of letters, digits, and underscores, and must start with a letter. The alias name "initial" is reserved for the database that is used when execsql starts script processing, and cannot be used with the CONNECT metacommand. If you re-use an alias name, the connection to the database to which that name was previously assigned will be closed, and the database will no longer be available. Using the same alias for two different databases allows for mistakes wherein script statements are run on the wrong database, and so is not recommended.
CONSOLE
Creates (ON) or destroys (OFF) a GUI console to which subsequent WRITE metacommands will send their output. Data tables exported as text will also be written to this console. The console window includes a status line and progress bar indicator that can each be directly controlled by metacommands listed below.
Only one console window can be open at a time. If a "CONSOLE ON" metacommand is used while a console is already visible, the same console will remain open, and no error will be reported.
A GUI console can be automatically opened when execsql is started by using the "-v3" option.
When the GUI console is turned OFF, subsequent output will again be directed to standard output (the terminal window, if you have one open).
If an error occurs while the console is open, the error message will be written on standard error (typically the terminal) rather than in the console, and the console will be closed as execsql terminates.
Hides or shows the console window. Text will still be written to the console window while it is hidden, and will be visible if the console is shown again.
The specified message is written to the status bar at the bottom of the console window. Use an empty message ("") to clear the status message.
The progress bar at the bottom of the console window will be updated to show the specified value. Values should be numeric, between zero and 100. If the number is followed by a slash (or virgule) and then another number, the two numbers will be taken as a fraction and converted to a percentage for display. Use a value of zero to clear the progress bar.
Saves the text in the console window to the specified file. If the "APPEND TO" keyword is used, the console text will be appended to any existing file of the same name; otherwise, any existing file will be overwritten.
Script processing will be halted until the user responds to the console window with either the <Enter> key or the <Esc> key, or clicks on the window close button. If an (optional) message is included as part of the command, the message will be written into the status bar. If the user responds with the <Enter> key, the console window will remain open and script processing will resume. The user can close the console window either with the <Esc> key or by clicking on the window close button.
The console window has a single menu item, 'Save as...', that allows the entire console output to be saved as a text file.
COPY
Copies the data from a data table or view in one database to a data table in a second database. The two databases between which data are copied are identified by the alias names that are established with the CONNECT metacommand. The alias "initial" can be used to refer to the database that is used when execsql starts script processing. Neither the source nor the destination database need be the initial database, or the database currently in use.
The second (destination) table must have column names that are identical to the names of the columns in the first (source) table. The second table may have additional columns; if it does, they will not be affected and their names don't matter. The data types in the columns to be copied must be compatible, though not necessarily identical. The order of the columns in the two tables does not have to be identical.
If the NEW keyword is used, the destination table will be automatically created with column names and data types that are compatible with the first (source) table. The data types used for the columns in the newly created table will be determined by a scan of all of the data in the first table, but may not exactly match those in the first table. If the destination table already exists when the NEW keyword is used, an error will occur.
If the REPLACEMENT keyword is used, the destination table will also be created to be compatible with the source table, but any existing destination table of the same name will be dropped first. execsql uses a "drop table" statement to drop an existing destination table, and this statement may not succeed if there are dependencies on that table (see the section on implicit drop table statements). If the destination table is not dropped, then data from the source table will be added to the existing table, or an error will occur if the table formats are not compatible.
If there are constraints on the second table that are not met by the data being added, an error will occur. If an error occurs at any point during the data copying process, no new data will be added to the second table.
COPY QUERY
Copies data from one database to another in the same manner as the COPY metacommand, except instead of specifying the source table (or view), a SQL query statement is used instead. The SQL statement must be terminated with a semicolon and enclosed in double angle brackets.
Like all metacommands, this metacommand must appear on a single line, although the SQL statement may be quite long. To facilitate readability, the SQL statement may be saved in a substitution variable and that substitution variable referenced in the COPY QUERY metacommand.
Sends an email. The from_address should be a valid email address (though not necessarily a real one). The to_addresses should also be a valid email address, or a comma- or semicolon-delimited list of email addresses. If none of the destination email addresses are valid, an exception will occur and execsql will halt. If at least one of the email addresses is valid, the command will succeed.
The subject and the message_text should both be enclosed in double quotes and should not contain a double quote.
If the MESSAGE_FILE keyword is used, the contents of that file will be inserted into the body of the email message in addition to whatever message_text is specified. The filename may be unquoted, but must be quoted if it contains any space characters.
If the ATTACH_FILE keyword is used, the specified file will be attached to the email message. The attachment_filename may be unquoted, but must be quoted if it contains any space characters.
The SMTP host and any other connection information that is necessary must be specified in the "email" section of a configuration file.
EMPTY_STRINGS
Determines whether empty strings are allowed in data that is saved using either the IMPORT or COPY metacommands. The default is to allow empty strings. A metacommand of EMPTY_STRINGS NO will cause all empty strings to be replaced by NULL.
ERROR_HALT
When ERROR_HALT is set to ON, which is the default, any errors that occur as a result of executing a SQL statement will cause an error message to be displayed immediately, and execsql will exit. When ERROR_HALT is set to OFF, then SQL errors will be ignored, but can be evaluated with the IF SQL_ERROR conditional.
EXECUTE
Executes the specified stored procedure (or function, or query, depending on the DBMS). Conceptually, the EXECUTE metacommand is intended to be used to execute stored procedures that do not require arguments and do not return any values. The actual operation of this command differs depending on the DBMS that is in use.
Access has only stored queries, which may be equivalent to either a view or a stored procedure in other DBMSs. When using Access, the query referenced in this command should be an INSERT, UPDATE, or DELETE statement—executing a SELECT statement in this context would have no purpose.
Postgres has stored functions. Functions with no return value are equivalent to stored procedures. When using Postgres, execsql treats the argument as the name of a stored function. It appends an empty pair of parentheses to the function name before calling it, so you should not include the parentheses yourself; the reason for this is to maintain as much compatibility as possible in the metacommand syntax across DBMSs.
SQL Server has stored procedures. When using SQL Server, execsqsl treats the argument as the name of a stored procedure.
SQLite does not support stored procedures or functions, and (unlike Access queries), views can only represent SELECT statements. When using SQLite, execsql cannot treat the argument as a stored procedure or function, so it treats it as a view and carries out a SELECT * FROM <procedure_name>; statement. This is unlikely to be very useful in practice, but it is the only reasonable action to take with SQLite.
MySQL and MariaDB support stored procedures and user-defined functions. User-defined functions can be invoked within SQL statements, so execsql considers the argument to the EXECUTE metacommand to be the name of a stored procedure, and calls it after appending a pair of parentheses to represent an empty argument list.
Firebird supports stored procedures, and execsql executes the procedure with the given name, providing neither input parameters nor output parameters.
EXECUTE SCRIPT
This metacommand will execute the set of SQL statements and metacommands that was previously defined and named using the BEGIN/END SCRIPT metacommands.
EXPORT
Exports data to a file. The data set named in this command must be an existing table or view. The output filename specified will be overwritten if it exists unless the APPEND keyword is included. If the output name is given as "stdout", the data will be sent to the console instead of to a file. If specified by the "-d" command-line option or the "make_export_dirs" configuration option, execsql will automatically create the output directories if needed.
If the "TEE" keyword is used, the data will be exported to the terminal in the "TXT" format (as described below) in addition to whatever other type of output is produced.
The EXPORT metacommand has two forms, as shown above. The first of these will export the data in a variety of established formats, and the second of which will use one of several different template processors with a template specification file. The first form is more convenient if any of the supported formats is suitable, and the latter form allows more flexible customization of the output.
Exporting Data to Specific Supported Formats
The format specification in the first form of the EXPORT metacommand controls how the data table is written. The allowable format specifications and their meanings are:
- CSV
- Comma-delimited with double quotes around text that contains a comma or a double quote. Column headers will not be written if the APPEND keyword is used. No description text will be included in the output even if it is provided.
- HTML
- Hypertext markup language. If the APPEND keyword is not used, a complete web page will be written, with meta tags in the header to identify the source of the data, author, and creation date; simple CSS will be defined in the header to format the table. If the APPEND keyword is used, only the table will be written to the output file. If the APPEND keyword is used and the output file contains a </body> tag, the table will be written before that tag rather than at the physical end of the file. The HTML tags used to create the table have no IDs, classes, styles, or other attributes applied. Custom CSS can be specified in configuration files. If the DESCRIPTION keyword is used, the given description will be used as the table's caption.
- JSON
- Javascript Object Notation. The data table is represented as an array of JSON objects, where each object represents a row of the table. Each row is represented as a set of key:value pairs, with column names used as the keys. No description text will be included in the output even if it is provided.
- LATEX
- Input for the LaTeΧ typesetting system. If the APPEND keyword is not used, a complete document (of class article) will be written. If the APPEND keyword is used, only the table definition will be written to the output file. If the APPEND keyword is used and an existing output file contains an \end{document} directive, the table will be written before that directive rather than at the physical end of the file. Wide or long tables may exceed LaTeΧ's default page size. If the DESCRIPTION keyword is used, the given description will be used as the table's caption.
- ODS
- OpenDocument spreadsheet. When the APPEND keyword is used, each data set that is exported will be on a separate worksheet. The name of the view or table exported will be used as the worksheet name. If this conflicts with a sheet already in the workbook, a number will be appended to make the sheet name unique. (If a workbook with sheet names longer than 31 characters is opened in Excel, the sheet names will be truncated.) A sheet named "Datasheets" will also be created, or updated if it already exists, with information to identify the author, creation date, description, and data source for each data sheet in the workbook.
- PLAIN
- Text with no header row, no quoting, and columns delimited by a single space. This format is appropriate when you want to export text—see Example 11 for an illustration of its use. No description text will be included in the output even if it is provided.
- RAW
- Data exactly as stored with no headers, quotes, or delimiters between either columns or rows. This format is most suitable for export of binary data. No description text will be included in the output even if it is provided.
- TAB or TSV
- Tab-delimited with no quoting. Column headers will not be written if the APPEND keyword is used. No description text will be included in the output even if it is provided.
- TABQ or TSVQ
- Tab-delimited with double quotes around any text that contains a tab or a double quote. Column headers will not be written if the APPEND keyword is used. No description text will be included in the output even if it is provided.
- TXT
- Text with data delimited and padded with spaces so that values are aligned in columns. Column headers are underlined with a row of dashes. Columns are separated with the pipe character (|). Column headers are always written, even when the APPEND keyword is used. This output is compatible with Markdown pipe tables—see Example 8. If the DESCRIPTION keyword is used, the given description will be written as plain text on the line before the table.
- TXT-ND
- This is the same as the TXT format, except that table cells where data are missing are filled with "ND" instead of being blank. Some tables with blank cells are not parsed correctly by pandoc, and this format ensures that no cells are blank. If the DESCRIPTION keyword is used, the given description will be written as plain text on the line before the table.
- US
- Text with the unit separator as the column delimiter, and no quoting. Column headers will not be written if the APPEND keyword is used. No description text will be included in the output even if it is provided.
- VALUES
- Data are written into the output file in the
format of a SQL INSERT...VALUES statement. The name of the target table
is specified in the form of a substitution variable named
target_table; the format of the complete
statement is:
insert into !!target_table!! (<list of column headers>) values (<Row 1 data>), (<Row 2 data>), ... (<Row N data>) ;If the DESCRIPTION keyword is used, the description text will be included as a SQL comment before the INSERT statement. The INCLUDE metacommand can be used to include a file written in this format, and the target table name filled in with an appropriately-named substitution variable. This output format can also be used to copy data between databases when it is not possible to use execsql's CONNECT and COPY metacommands.
Exporting Data Using a Template
Template-based exports provide a simple form of report generation or mail-merge capability. The template used for this type of export is a freely-formatted text file containing placeholders for data values, plus whatever additional text is appropriate for the purpose of the report. The exported data will therefore not necessarily be in the form of a table, but may be presented as lists, embedded in paragraphs of text, or in other forms.
execsql supports three different template processors, each with its own syntax. The template processor that will be used is controlled by the "template_processor" configuration property. These processors and the syntax they use to refer to exported data values are:
- The default (no template processor specified)
- Data values are referenced in the template by the column name
prefixed with a dollar sign or enclosed in curly braces prefixed
with a dollar sign. For example if an exported data table contains a
column named "vessel", that column could be referred to in either
of these ways:
Survey operations were conducted from $vessel. The ${vessel}'s crew ate biscuits for a week.The default template processor does not include any features that allow for conditional tests or iteration within the template. The entire template is processed for each row in the exported data table, and all of the output is combined into the output file.
- Jinja
- Data values are referenced in the template within pairs of
curly braces. The Jinja template processor allows conditional
tests and iteration, as well as other features, within the
template. The entire exported data set is passed to the template
processor as an iterable object named "datatable". For example,
if an exported data table contains a column named "hire_date", that
column could be referred to, while iterating over the entire
data set, as follows:
{% for row in datatable %} Hire date: {{ row.hire_date }} . . . {% endfor %}The template syntax used by Jinja is very similar to that used by Django. Jinja's Template Designer Documentation provides more details about the template syntax.
- Airspeed
- Data values are refenced in the template by the name (or object)
name prefixed with a dollar sign, or enclosed in curly braces and
prefixed with a dollar sign, just as for the default template processor.
The Airspeed template processor also allows conditional tests and
iteration, and as with Jinja, the entire exported data set is
passed to the template processor as an iterable object named
"datatable". For example, if an exported data set contains
bibliographic information, those columns could be referenced,
while iterating over the entire data set to produce a
BibTex
bibliography, as follows:
#foreach ($doc in $datatable) @$doc.doc_type {$doc.doc_id, author = {$doc.author}, title = {$doc.title}, . . . } #endThe template syntax used by Airspeed duplicates that used by Apache Velocity, and the Velocity User's Guide and Reference Guide provide details about the template syntax.
The Jinja and Airspeed template processors are both more powerful than the default, but as a result are also more complex. The different alternatives may be suitable for different purposes, or for different users, based on prior experience. One potentially important difference between Jinja and Airspeed is that Airspeed requires that the entire data set be processed at once, whereas Jinja does not; for very large data sets, therefore, Airspeed could encounter memory limitations.
EXPORT QUERY
Exports data to a file in the same manner as the EXPORT metacommand, except that the data source is a SQL query statement in the metacommand rather than a database table or view. The SQL query statement must be terminated with a semicolon and enclosed in double angle brackets (i.e., literally "<<" and ">>").
Like all metacommands, this metacommand must appear on a single line, although the SQL statement may be quite long. To facilitate readability, the SQL statement may be saved in a substitution variable and that substitution variable referenced in the EXPORT QUERY metacommand.
HALT
Script processing is halted, and the execsql.py program terminates. If an error message is provided, it is written to the console, unless the "-v2" or "-v3" option is used, in which case the message is displayed in a dialog. If an EXIT_STATUS value is specified, the system exit status is set to that value, otherwise, the system exit status is set to 2.
HALT DISPLAY
Script processing is halted, and the error message is displayed in a GUI window. If a table or view name is provided, the data from that table or view is also displayed. If an EXIT_STATUS value is specified, the system exit status is set to that value, otherwise, the system exit status is set to 2.
IF
The IF metacommand allows you to test for certain conditions and control which script statements are subsequently executed. There are two forms of the IF metacommand:
- A single-line IF statement that will conditionally run a single metacommand.
- A multi-line IF statement that must be terminated with an ENDIF metacommand. The multi-line form supports ELSE, ELSEIF, ANDIF, and ORIF clauses.
The syntax for the single-line IF metacommand is:
The conditional tests that can be used are listed below. For the single-line form of the IF metacommand, the metacommand to be executed must be enclosed in curly braces following the conditional test.
The syntax for the multi-line IF metacommand can take several forms, depending on whether the additional ELSE, ELSEIF, ANDIF, and ORIF clauses are used. The simplest form of the multi-line IF metacommand is:
Multi-line IF metacommands can be nested within one another, and single-line IF metacommands can appear within a multi-line IF metacommand.
The ELSE clause allows you to conditionally execute either of two sets of script commands. The form of this set of statements is:
The ELSEIF clause combines the actions of the ELSE clause with another IF metacommand—effectively, nesting another IF metacommand within the ELSE clause, but not requiring a second ENDIF statement to terminate the nested conditional test. The form of this set of statements is:
Multiple ELSEIF clauses can be used within a single multi-line IF metacommand. An ELSE clause can be used in combination with ELSEIF clauses, but this is not recommended because the results are not likely to be what you expect—the ELSE keyword only inverts the current truth state, it does not provide an alternative to all preceding ELSEIF clauses. To achieve the effect of a case or switch statement, use only ELSEIF clauses without a final ELSE clause.
The ANDIF clause allows you to test for the conjunction of two conditionals without having to nest IF metacommands and use two ENDIF statements. The simplest form of usage of the ANDIF clause is:
The ANDIF clause does not have to immediately follow the IF metacommand. It could instead follow an ELSE statement, or appear anywhere at all within a multi-line IF metacommand. Usage patterns other than that illustrated above may be difficult to interpret, however, and nested IF metacommands may be preferable to complex uses of the ANDIF clause.
The ORIF clause is similar to the ANDIF clause, but allows you to test the disjunction of two conditionals. The simplest form of usage of the ORIF clause is:
The IF metacommands can be used not only to control a single stream of script commands, but also to loop over sets of SQL statements and metacommands, as shown in Example 6.
The conditional tests that can be used with IF and WAIT_UNTIL metacommands are listed in the following subsections.
ALIAS_DEFINED test
Evaluates whether a database connection has been made using the specified alias. Database aliases are defined using the CONNECT and PROMPT CONNECT metacommands.
COLUMN_EXISTS test
Evaluates whether there is a column of the given name in the specified database table. The table name may include a schema. execsql queries the information schema tables for those DBMSs that have information schema tables. You must have permission to use these system tables. If you do not, an alternative approach is to try to select data from the specified column table and determine if an error occurs.
CONSOLE test
Evaluates whether the GUI console is running.
DATABASE_NAME test
Evaluates whether the current database name matches the one specified. Database names used in this conditional test should exactly match those contained in the "$CURRENT_DATABASE" substitution variable.
DBMS test
Evaluates whether the current DBMS matches the one specified. DBMS names used in this conditional test should exactly match those contained in the "$CURRENT_DBMS" substitution variable.
DIRECTORY_EXISTS test
Evaluates whether there is an existing directory with the given name.
EQUAL test
Evaluates whether the two values are equal. The two string representations of the values first are converted to a normalized Unicode form (Normal Form C) and then are compared as integers, floating-point values, date/time values with a time zone, date/time values, dates, Boolean values, and strings. String comparisons are case insensitive. The first of these data types to which both values can be successfully converted is the basis for determining whether the values are equal. See also IDENTICAL
FILE_EXISTS test
Evaluates whether there is a disk file of the given name.
HASROWS test
Evaluates whether the specified table or view has a non-zero number of rows.
IDENTICAL test
Evaluates whether the two quoted strings are exactly identical. No Unicode normalization is done, and the comparison is case-sensitive. See also EQUAL.
IS_GT test
Evaluates whether or not the first of the specified values is greater than the second value. If the values are not numeric, an error will occur, and script processing will halt.
IS_GTE test
Evaluates whether or not the first of the specified values is greater than or equal to the second value. If the values are not numeric, an error will occur, and script processing will halt.
IS_NULL test
Evaluates whether or not the specified value is null—that is, whether it is a zero-length string.
IS_ZERO test
Evaluates whether or not the specified value is equal to zero. If the value is not numeric, an error will occur, and script processing will halt.
SCHEMA_EXISTS test
Evaluates whether or not the specified schema already exists in the database. For DBMSs that do not support schemas (SQLite, MySQL, Firebird, and Access), this will always return a value of False. execsql queries the information schema tables, or analogous tables, for this information. You must have permission to use these system tables.
METACOMMAND_ERROR test
Evaluates whether the previous metacommand generated an error. This test for SQL errors will only be effective if the METACOMMAND_ERROR_HALT OFF metacommand has previously been issued. This conditional must be used in the first metacommand after any metacommand that might have encountered an error.
NEWER_DATE test
Evaluates whether the specified file was last modified after the given date. This can be used, for example, to compare the date of an output file to the latest revision date of all the data rows that should be included in the output; if the data have been revised after the output file was created, the output file should be regenerated.
NEWER_FILE test
Evaluates whether the first of the specified files was last modified after the second of the files. This can be used, for example, to compare the date of an output file to the date of the script file that produces that output; if the script is newer, it may be IMPORTed and run again.
SQL_ERROR test
Evaluates whether the previous SQL statement generated an error. Errors will result from badly-formed SQL, reference to non-existent database objects, lack of permissions, or database locks. A query (e.g., an update query) that does not do exactly what you expect it to will not necessarily cause an error to occur that can be identified with this statement. This test for SQL errors will only be effective if the ERROR_HALT OFF metacommand has previously been issued.
Errors in metacommands and some other errors encountered by execsql will cause the program to halt immediately, regardless of the setting of ERROR_HALT or the use of the IF( SQL_ERROR() ) test.
SUB_DEFINED test
Evaluates whether a replacement string has been defined for the specified substitution variable (matching string).
TABLE_EXISTS test
Evaluates whether there is a database table of the given name. execsql queries the information schema tables, or analogous tables, for this information. You must have permission to use these system tables. If you do not, an alternative approach is to try to select data from the table and determine if an error occurs; for example:
VIEW_EXISTS test
Evaluates whether there is a database view of the given name. For Access, this tests for the existence of a query of the given name. execsql queries the information schema tables, or analogous tables, for this information. You must have permission to use these system tables. If you do not, the alternative approach described for the TABLE_EXISTS conditional can be used.
IMPORT
Imports data from a file into a new or existing database table. Data can be imported from either a text file or a spreadsheet. The syntax of the IMPORT metacommand for importing data from a text file is:
The syntax for importing data from an OpenDocument spreadsheet is:
The syntax for importing data from an Excel spreadsheet is:
Column names and column order in the input must exactly match those in the target table. The column names in the input must also be valid for the DBMS in use.
If the "WITH QUOTE <quote_char> DELIMITER <delim_char>" clause is not used with text files, execsql will scan the text file to determine the quote and delimiter characters that are used in the file. By default, the first 100 lines of the file will be scanned. You can control the number of lines scanned with the "-s" option on the command line. If the "WITH..." clause is used, the file will not be scanned to identify the quote and delimiter characters regardless of the setting of the "-s" option.
execsql will read CSV files containing newlines embeded in delimited text values. Scanning of a CSV file to determine the quote and delimiter characters may produce incorrect results if most of the physical lines scanned consist of text that makes up only part of a logical data column.
The quoting characters that will be recognized in a text file, and that can be specified in the "WITH..." clause are the double quote (") and the single quote ('). If no quote character is used in the file, this can be specified in the metacommand as "WITH QUOTE NONE".
The delimiter characters that will be recognized in a text file, and that can be specified in the "WITH..." clause are the comma (,), semicolon (;), vertical rule (|), tab, and the unit separator. To specify that the tab character is used as a delimiter, use "WITH...DELIMITER TAB", and to specify that the unit separator is used as a delimiter, use "WITH...DELIMITER US".
The SKIP key phrase specifies the number of lines (or rows) at the beginning of the file (or worksheet) to discard before evaluating the remainder of the input as a data table.
If the NEW keyword is used, the input will be scanned to determine the data type of each column, and a CREATE TABLE statement run to create a new table for the data. Scanning of the file to determine data formats is separate from the scanning that is done to determine the quote and delimiter characters. If the table already exists when the NEW keyword is used, a fatal exception will result. If the REPLACEMENT keyword is used, the result is the same as if the NEW keyword were used, except that an existing table of the given name will be deleted first. If the table does not exist, an informational message will be written to the log.
If a table is scanned to determine data types, any column that is completely empty (all null) will be created with the text data type. This provides the greatest flexibility for subsequent addition of data to the table. However, if that column ought to have a different data type, and a WHERE clause is applied to that column assuming a different data type, the DBMS may report an error because of incomparable data types.
The handling of Boolean data types when data are imported depends on the capabilities of the DBMS in use. See the relevant section of the SQL syntax notes.
If a column of imported data contains only numeric values, but any non-zero value has a leading digit of "0", that column will be imported as a text data type (character, character varying, or text).
When execsql generates a CREATE TABLE statement, it will quote column names that contain any characters other than letters, digits, or the underscore ("_"). A mixture of uppercase and lowercase letters in a column name is not taken as an indication that a quoted identifier should be used for the column name, and execsql does not fold column names to either uppercase or lowercase. Case sensitivity and case-folding behavior varies between DBMSs, and execsql leaves it to the user to manage these differences.
The case-folding behavior of the DBMS should also be considered when specifying the table name in the IMPORT metacommand. When execsql checks to see if a table exists, it queries the information schema using the table name exactly as given (i.e., execsql does not do any case folding); if the actual table name differs because of case folding by the DBMS, the check will fail and an error will occur.
If neither the NEW or REPLACEMENT keywords are used, the table must already exist, and have column names identical to those in the file, and in the same order. The data types in the table must also be compatible with those in the file.
If the NEW keyword is used, the target table will be created without a primary key or other constraints. If data are imported to an existing table, they must meet any constraints already in place on that table. If data are imported to an existing table, the imported data will be added to any already-existing data. If existing data are to be replaced, they should be deleted before the IMPORT metacommand is run.
The NEW keyword cannot be used within a batch with Firebird. Firebird requires that the CREATE TABLE statement be committed—the table actually created—before data can be added. There is only one commit statement for a batch, at the end of the batch, and therefore the CREATE TABLE statement is not committed before data are added.
If the ENCODING keyword is not used, the character encoding of text files imported with the IMPORT metacommand is as specified with the "-i" command-line option or the corresponding configuration file option. If not specified in either of these ways, the encoding is assumed to be UTF-8. If a UTF byte order mark is found at the start of a data file, the encoding indicated by that marker will be taken as definitive regardless of the ENCODING keyword or the "-i" option.
Import of data from text files to Postgres and MySQL uses the fast file reading features provided by both of those databases: Postgres' COPY command and MySQL's LOAD DATA LOCAL INFILE command. For Postgres, if the file encoding is of a type that is not recognized by Postgres (see https://www.postgresql.org/docs/current/static/multibyte.html), a slower loading routine will be used, with encoding conversion handled by execsql.
The sheet name used when importing data from a spreadsheet can be either the sheet name, as it appears on the tab at the bottom of the sheet, or the sheet number. Comparison of the actual sheet names to the value given is case-insensitive. Sheet numbers start at 1.
When MS-Excel saves an OpenDocument spreadsheet, it may create an additional empty column to the right of all data columns. This spurious column is not eliminated by opening and re-saving the spreadsheet using LibreOffice Calc (as of version 5.0.2 at least). The IMPORT metacommand will report an error with such a file because of the absence of a column header on the extra column. To avoid this problem, as well as other issues related to incorrect implementation of the OpenDocument standard in Excel, and the data corruption that can occur when Excel imports and exports CSV files, and the ambiguous representation of dates in Excel, Excel should not be used for data that may be transferred to or from databases or other formats. Import of data from Excel may also take 10-100 times longer—or more—than import from a text file.
Some performance considerations when using IMPORT are:
- Creating the table using a separate CREATE TABLE statement before the IMPORT metacommand will be faster than using the NEW or REPLACEMENT keywords. The time required for execsql to scan an entire file to determine data types can be much greater than the time required to import the file.
- When importing to Postgres from a text file that has an encoding that is recognized by Postgres, data are read and processed in chunks that are 32 kb in size. A larger or smaller value may give better performance, depending on system-specific conditions. The "-z" command-line option can be used to alter the buffer size.
In general, if an error occurs while importing data, none of the new data should be in the target table (the operation is not committed). However, MySQL may issue messages about data type incompatibility to the standard error device (ordinarily the terminal), yet load some or all of the data. If the NEW or REPLACEMENT keywords are used, depending on the DBMS and where the error occurred, the target table may be created even if the data are not loaded.
INCLUDE
The specified file should be a script that contains SQL statements and/or metacommands. Those SQL statements and metacommands will be inserted into the script at the point where the INCLUDE metacommand occurs.
LOG
Writes the specified message to execsql's log file.
LOG_WRITE_MESSAGES
When this is set to ON, all output of the WRITE metacommand will also be written to execsql's log file. The default is not to echo WRITE messages to the log. This behavior can also be controlled with a configuration option.
MAX_INT
Specifies the threshold between integer and bigint data types that is used by the IMPORT and COPY metacommands when creating a new table. Any column with integer values less than or equal to this value (max_int) and greater than or equal to -1 × max_int - 1 will be considered to have an 'integer' type. Any column with values outside this range will be considered to have a 'bigint' type. The default value for max_int is 2147483647. The max_int value can also be altered using a configuration option.
METACOMMAND_ERROR_HALT
When METACOMMAND_ERROR_HALT is set to ON, which is the default, any errors that occur during execution of a metacommand will cause an error message to be displayed immediately, and execsql to exit. When METACOMMAND_ERROR_HALT is set to OFF, then metacommand errors will be ignored, but can be evaluated with the IF METACOMMAND_ERROR conditional.
PAUSE
Displays the specified text and pauses script processing. You can continue script processing with the <Enter> key, or halt script processing with the <Esc> key. The message will be displayed on the console by default; if the "-v" command-line option is used, the message will be displayed in a GUI dialog.
If the "HALT|CONTINUE..." clause is used, the PAUSE prompt will disappear after the specified time, regardless of whether the <Enter> or <Esc> keys were struck. If the PAUSE prompt times out in this way, script processing will be either halted or continued, as specified. The prompt with a timeout limit will look like this on the console:
The countdown of time remaining is always displayed in seconds.
If the "-v1", "-v2", or "-v3" option is used, the prompt will appear in a GUI dialog instead of on the console.
If the "HALT" action is taken, either as a result of user input or as a result of a timeout, the effect on the script depends on the CANCEL_HALT setting. If script processing is halted, the system exit value will be set to 2.
PROMPT ASK
Prompts for a yes or no response to the specified question, using a dialog box, and assigns the result, as either "Yes" or "No", to the substitution variable specified. A data table or view can optionally be displayed with the question (as shown for the PROMPT DISPLAY metacommand). The "Y" and "N" keys will select the corresponding response, and the <Enter> key will also select the "Yes" response. The <Esc> key will cancel the script. The selection is also logged. If the prompt is canceled, script processing is halted, and the system exit value is set to 2.
PROMPT CONNECT
Prompts for database connection parameters in a dialog box, and assigns that connection to the specified database alias. Any database connection previously associated with this alias will be closed, even if the prompt is canceled.
The connection dialog looks like this:
The prompt provides several common options for the database encoding. If the database uses a different encoding, you can type in the name of that encoding.
If the port is not specified, the default port for the selected DBMS will be used.
If a password is not provided, a connection will be attempted without using any password; there will be no additional prompt for a password.
If a file-based DBMS (MS-Access or SQLite) is selected, the prompt for the server and other information will be replaced by a prompt for a file name.
PROMPT DIRECTORY
Prompts for the name of an existing directory, using a dialog box, and assigns the selected directory name (including the full path) to the substitution variable specified. The selection is also logged. If the prompt is canceled, unless CANCEL_HALT is set to OFF, script processing is halted, and the system exit value is set to 2. If CANCEL_HALT is set to ON, the specified substitution variable will be undefined.
PROMPT DISPLAY
Displays the selected view or table in a window with the specified message and both 'Continue' and 'Cancel' buttons. If the 'Continue' button is selected, the script will continue to run. If the 'Cancel' button is selected, the script will immediately halt. The Enter key also carries out the action of the 'Continue' button, and the Escape key carries out the action of the 'Cancel' button.
The prompt display looks like this:
PROMPT ENTER_SUB
Prompts for a replacement string to be assigned to the specified substitution variable (matching string). If the "PASSWORD" keyword is included, the characters that are typed in response to the prompt will be hidden.
PROMPT ENTRY_FORM
Dynamically creates a data entry form following the specifications in the specification_table and assigns the entered values to the substitution variables named in the specification table.
The data entry form will have one data entry prompt for every row in the specification table. The following columns in the specification table will be used to construct the data entry form:
- sub_var
- The name of the substitution variable to which the entered value will be assigned. This column is required, and must contain non-null text.
- prompt
- The text to display on the form as a prompt to the user to indicate what information should be entered. This column is required, and must contain non-null text.
- required
- An indicator of whether a non-null value must be provided. This column is optional. If present, it should have a Boolean data type. If the column is missing or the contents are null, the value will not be required.
- initial_value
- The initial, or current, value. It will be displayed on the form and may be replaced. This column is optional, and if present, its contents may be null.
- width
- An integer specifying the width of the entry area for this value, in characters. This column is optional, and if present, its contents may be null.
- entry_type
- Text specifying the type of entry control to use on the form. The only meaningful value is "checkbox", which will cause a checkbox to be used on the form. If this column has any other value, or is null, or is missing, either a text entry control will be used, or a dropdown control will be used if a lookup table is specified. If "checkbox" is specified, the values returned in the substitution variable will always be either "0", indicating that the checkbox was cleared, or "1", indicating that the checkbox was checked.
- lookup_table
- The name of a table or view containing, in its first column, a set of valid values for this entry. This column is optional, and if present, its contents may be null. If present, the entry will be constrained to only members of the given list.
- validation_regex
- A regular expression pattern to be used to validate the entry. This validation check will be applied when the entry is about to lose focus; if the entered value does not match the regular expression, the entry will retain focus until it is corrected. This column is optional, and if present, its contents may be null.
- validation_key_regex
- A regular expression pattern to be used to validate each keystroke for the entry. This validation check will be applied for each keystroke while the entry has the focus. The entire value, with the additional keystroke applied, must match the regular expression. If it does not match, the keystroke will not change the entry. This column is optional, and if present, its contents may be null.
- sequence
- A value used to specify the order in which values should appear on the form. This column is optional; if absent, the order of values on the form is indeterminate.
The order of the columns in the specification table does not matter. The specification table may contain additional columns other than those listed above; if it does, those columns will be ignored.
After data entry is complete and the data entry form is closed with the "Continue" button that appears on the form, the designated substitution variables will be defined to have the corresponding values that were entered. Substitution variables will not be defined for values that were not entered (were left empty on the form) even if they had been defined previously—except for checkboxes, for which the substitution variable is always defined and assigned a value of "0" or "1".
Although the PROMPT ENTRY_FORM metacommand supports validation of individual entries through the use of either a list of valid values or a regular expression, it does not support cross-column validation or foreign key checks (except for single valid values). The primary purpose of execsql is to facilitate scripting, and therefore documentation, of data modifications, and interactive data entry runs counter to that purpose. There are nevertheless circumstances in which a data entry form is an appropriate tool to collect user input. Use of a simple custom data entry form is illustrated in Example 18.
PROMPT OPENFILE
Prompts for the name of an existing file (implicitly, to be opened), using a dialog box, and assigns the selected filename (including the full path) to the substitution variable specified. The selection is also logged. If the prompt is canceled, unless CANCEL_HALT is set to OFF, script processing is halted, and the system exit value is set to 2. If CANCEL_HALT is set to ON, the specified substitution variable will be undefined.
PROMPT SAVEFILE
Prompts for the name of a new or existing file, using a dialog box, and assigns the selected filename (including the full path) to the substitution variable specified. The selection is also logged. If the prompt is canceled, unless CANCEL_HALT is set to OFF, script processing is halted, and the system exit value is set to 2. If CANCEL_HALT is set to ON, the specified substitution variable will be undefined.
PROMPT SELECT_SUB
Displays the selected data table or view, similar to the PROMPT DISPLAY metacommand, but allows you to select a single row of data, and then assigns the data values from that row to a set of substitution variables corresponding to the column names, but prefixed with the "@" character. This prefix prevents any conflict between these automatically-assigned substitution variables and any others that you may have created with the SUB command or by any other means—except for the SELECT_SUB metacommand, which uses the same prefix for substitution variables.
Data can be selected from the display either by highlighting the row with a single mouse click and then clicking on the "OK" button, or by double-clicking on a row. If data selection is canceled either with the "Cancel" button or by hitting the Escape key, script processing will be halted and the system exit value will be set to 2, unless CANCEL_HALT has been set to OFF.
Null values in the selected data row will be represented by substitution variables with zero-length string values.
If the CONTINUE keyword is used, then a "Continue" button will also be displayed in the dialog box. This option allows the user to close the dialog without either selecting an item or canceling the script.
If no data value is selected (i.e., either the "Continue" button has been used, or the "Cancel" button has been used and CANCEL_HALT has been set to OFF), all data values corresponding to column names of the table that was displayed will be undefined, even if they were defined before the table was displayed.
See Example 8 for an illustration of the use of this metacommand, and Example 17 for an illustration of the use of the CONTINUE keyword.
RESET COUNTER
Resets the specified counter variable so that the next reference to it will return a value of 1.
RESET COUNTERS
Resets all counter variables so that the next reference to any of them will return a value of 1.
RM_FILE
Deletes the specified file. Although execsql is not intended to be a file management tool, there are occasions when deletion of a file from within the script may be a useful workflow step—for example, if header information was written to an output file in anticipation of subsequent addition of error messages, but no errors were later encountered. Whereas the EXPORT metacommand will automatically overwrite an existing file if it exists, the WRITE metacommand always appends text to an existing file. The RM_FILE metacommand is therefore useful to remove an existing output text file that you wish to rewrite. The RM_FILE metacommand is also useful when you want to create a new SQLite database (using the NEW keyword) and want to ensure that the SQLite file does not already exist, to avoid the error that the CONNECT metacommand would otherwise raise.
If the file that is to be deleted does not actually exist, no error will occur.
RM_SUB
Deletes the specified user-created subsitution variable.
SELECT_SUB
Assigns data values from the first row of the specified table or view to a set of substitution variables corresponding to the column names, but prefixed with the "@" character. This prefix prevents any conflict between these automatically-assigned substitution variables and any others that you may have created with the SUB command or by any other means—except for the PROMPT SELECT_SUB metacommand, which uses the same prefix for substitution variables.
Null values in the selected data row will be represented by substitution variables with zero-length string values.
If the selected table or view contains no data, an error will occur, and script processing will be halted.
SET COUNTER
Assigns the specified value to the counter. The next time that this counter is referenced, the value returned will be one larger than the value to which it is set by this metacommand.
SUB
Defines a substitution variable (the <match_string>) which, if matched on any line of the script, will be replaced by the specified replacement string. Replacement will occur on all following lines of the script (and all included scripts) before the lines are evaluated in any other way. Every occurrence of the <match_string>, when immediately preceded and followed by two exclamation points ("!!"), will be replaced by the replacement string. Substitutions are processed in the order in which they are defined.
SUB_DECRYPT
Creates a substitution variable containing an unencrypted version of the given encrypted_text. The encrypted_text must have been produced by the SUB_ENCRYPT metacommand. The encryption method used is not of cryptographic quality, and is primarily intended to provide simple obfuscation of email passwords or other sensitive information that may appear in configuration or script files.
SUB_ENCRYPT
Creates a substitution variable containing an encrypted version of the given plaintext. The encryption method used is not of cryptographic quality, and is primarily intended to provide simple obfuscation of email passwords or other sensitive information that may appear in configuration or script files.
SUB_TEMPFILE
Assigns a unique temporary file name to the specified substitution variable (the match_string). The location of (path to) this temporary file is operating-system dependent; the file may not be located in the current working directory. The temporary file will not be created, opened, or used directly by execsql. All temporary files will automatically be deleted when execsql exits (however, a temporary file will not be deleted if it is in use by another process, and then may persist until manually removed). See Example 12 and Example 13 for llustrations of the use of temporary files.
SUBDATA
Defines a substitution variable which, if matched on any line of the script, will be replaced by the data value in the first column of the first row of the specified table or view.
If there are no rows in the specified data source, script processing will halt with an error message.
SYSTEM_CMD
The specified command line will be passed to the operating system to execute. This command is executed by the system, not by the command shell, so commands that are processed by the shell (e.g., "ls" in Linux and "dir" in Windows) cannot be used. Because commands are not processed by the shell, the system path is not searched for executable commands, so full path names must be used for executable files. Execution of the SQL script does not continue until the operating system command has completed.
On non-POSIX operating systems (specifically intended to be Windows), any backslashes in the command line will be doubled before the command line is passed to the operating system. Because backslashes are used as directory separators in Windows paths, this automatic alteration of the command line is meant to eliminate the need to double backslashes in path specifications on Windows.
TIMER
Starts or stops an internal timer. The value of the timer can be obtained with the $TIMER system variable. Elapsed time is reported in real-time seconds (not CPU time) to at least the nearest millisecond.
USE
Causes all subsequent SQL statements and metacommands to be applied to the database identified by the given alias name. The alias name must have been previously established by the CONNECT metacommand, or the alias name "initial" can be used to refer to the database that is used when execsql starts script processing.
WAIT_UNTIL
Suspends execution of the SQL script until the specified Boolean expression becomes true. The Boolean expressions that can be used with the WAIT_UNTIL metacommand are the same as those that can be used with the IF metacommands.
The condition is tested once per second for up to <n> seconds. If the condition has not become true by that time, then the script either halts or continues, as specified.
WRITE
Writes the specified text to the console or a file, or both. The text to be written must be enclosed in double quotes. If no output filename is specified, the text will be written to the terminal. If the "TEE" keyword is included, the text will be written to both the console and the specified file. If the "-v3" option is used, or a GUI console is opened explicitly, the text will be written to the GUI console. If the text is written to a file, it will always be appended to any existing file of the given name.
WRITE CREATE_TABLE
For data in a delimited text file:
For data in an OpenDocument spreadsheet:
For data in an Excel spreadsheet:
For data in a table of an aliased database:
Generates the CREATE TABLE statement that would be executed prior to importing data from the specified file or worksheet, or copying data from the specified aliased database, if the NEW or REPLACEMENT keyword were used with the IMPORT or COPY metacommand. The comment text, if provided, will be written as a SQL comment preceding the CREATE TABLE statement. The comment text must be double-quoted; table, file, and worksheet names can be quoted or unquoted. If no output filename is specified, the text will be written to the console. Text will always be appended to any existing file of the given name. See Example 12 for an illustration of the use of this metacommand.
The SKIP key phrase specifies the number of lines at the beginning of the file to discard before evaluating the remainder of the file as a data table.
The WRITE CREATE_TABLE command may report an error when used with ODS files that have been created or edited using Excel—see the description of the IMPORT metacommand for additional information about this problem.
Logging
execsql.py automatically logs certain actions, conditions, and errors that occur during the processing of a script file. Although a script file provides good documentation of database operations, there are circumstances in which a script file is not a definitive record of what operations were carried out. Circumstances that can result in incomplete execution of the script file are:
- Errors
- Choices made by the user in response to a PROMPT metacommand.
- Cancellation of the script in response to a PAUSE metacommand or password prompt from the CONNECT metacommand.
Information is logged into a tab-delimited text file named execsql.log in the same directory as the script file. This file contains several different record types. The first value on each line of the file identifies the record type. The second value on each line is a run identifier. All records that are logged during a single run of execsql.py have the same run identifier. The run identifier is a compact representation of the date and time at which the run started. The record types and the values that each record of that type contains are:
- run—Information about the run as a whole:
- Record type
- Run identifier
- Script name
- Script path
- Script file revision date
- Script file size in bytes
- User name
- Command-line options
- run_db_file—Information about the file-based database used (Access or SQLite):
- Record type
- Run identifier
- Database file name with full path
- run_db_server—Information about the server-based database used (Postgres or SQL Server):
- Record type
- Run identifier
- Server name
- Database name
- connect—The type and name of a database to which a connection has
been established; this may be either a client-server or file-based database:
- Record type
- Run identifier
- DBMS type and database identifiers
- action—Significant actions carried out by the script, primarily those that affect the results.
- Record type
- Run identifier
- Sequence number—The order of actions, status messages, and errors. Automatically generated.
- Action type—One of the following values:
- Line number—The script line number where the action takes place.
- Description—Free text describing the action.
- status—Status messages; ordinarily these are errors
- Record type
- Run identifier
- Sequence number—The order of actions, status messages, and errors. Automatically generated.
- Status type—One of the following values:
- exception
- error
- Description—Free text describing the status.
- exit—Program status at exit.
- Record type
- Run identifier
- Exit type—One of the following values:
- Line number—The script line number from which the exit was triggered (may be null).
- Description—Free text describing the exit condition.
The messages for each run are appended to the end of the log file.
Although logging is performed automatically by execsql, there are two ways to make use of the log file in custom scripts:
- The LOG metacommand provides a way to write a custom message into the log file.
- The $RUN_ID system variable provides a way to link other information (e.g., status or error messages) to the run that is identified in the log file.
Character Encoding
Command-line options allow specification of the encoding used in the database, the encoding used to read the script file and imported data files, and the encoding used to write output text. The encoding of data files to be imported can also be specified with the IMPORT metacommand. Database encoding can also be specified with the CONNECT metacommand. Specification of appropriate encoding will eliminate errors that would otherwise result from the presence of non-ASCII characters.
For Postgres and SQLite, the database encoding used is determined by interrogating the database itself, and any database encoding specified on the command line is ignored.
If no encodings are specified on the command line, the following default encodings are used:
- Script file: utf8
- Firebird: latin1
- MySQL and MariaDB: latin1
- SQL Server: latin1
- Access: windows_1252
- DSN: None
- Output: utf8
- Import: utf8
If a UTF byte order mark (BOM) is found at the start of the script file or at the start of a data file to be IMPORTed, the encoding indicated by the BOM will be taken as definitive regardless of any configuration options that may be used.
There is no default encoding for a DSN connection because the actual data source used is unknown, and because some ODBC drivers may return results in Unicode. If no encoding is specified, the ODBC driver must return result in Unicode or some compatible format (e.g., ASCII).
Some encodings are known by multiple names (aliases). In cases where the performance of the IMPORT metacommand is dependent on the compatibility of encoding (specifically, Postgres), execsql will try to match the input file and database encodings using the matching rules of Unicode Technical Starndard #22 and the equivalences documented by WHATWG.
The "-y" command-line option will display all of the encoding names that execsql recognizes. There are some aliases for the displayed encoding names that can also be used, if you know them. The encoding names used by each DBMS may differ from this list.
The log file is always written in UTF-8.
Using Script Files
Using script files to store and execute task-specific SQL statements has a number of advantages over using views, functions, or procedures that are stored within the database itself, particularly for one-off or infrequent tasks, or actions that must be applied to multiple databases. These advantages are:
- When database operations are only part of an overall task, maintenance and management of all components of the task is easier and more reliable when SQL scripts are kept together in the file system with input files, database output, output processing scripts, and final task products. Because all of the SQL needed for a specific data summarization task is kept together, there is little or no risk that one of a set of separate database objects—views or stored procedures—that are needed to complete a specific task will be either deleted or altered. The clutter of queries, functions, and procedures that would otherwise accumulate in a heavily used database can be reduced or eliminated.
- When multiple databases with the same data model are used (e.g., for different projects), only one copy of scripts tailored for that data model need be maintained, rather than having duplicate procedures or views in every database. This reduces maintenance and ensures consistency.
- Creation of the SQL script for a new task can be simplified by copying and editing a previously existing script. The user's preferred editor can be used to carry out search and replace operations to easily and reliably make changes throughout the entire set of SQL statements and scripts that are needed for a particular task.
- Complete documentation can (and should!) be included in the script files, so that the purpose, assumptions, limitations, and history of changes can be easily reviewed by anybody who might consider using or modifying the query script. This documentation is easily accessible to scanning and searching tools like grep.
- The script can be easily preserved to document the way in which data were selected or summarized. Scripts can be easily archived, backed up, and put under version control independently of the database. Script files can be made read-only so that they cannot easily or accidentally be modified after the script for a particular task has been finalized.
- Data management processes can be more easily automated by integrating a script-processing tool like execsql with other system tools than by using interactive database interfaces. The ability of execsql to export data in CSV, TSV, OpenDocument spreadsheet, readable Markdown-compatible text, HTML, JSON, and LaTeΧ formats reduces the amount of time that might otherwise be required to interactively open the database, run the appropriate query (not to mention verifying that the query, or any queries that it depends on, have not been altered), export the result, and reformat the result. If the query output will be further processed or used in another scriptable application (e.g., to produce graphics or statistics using R), execsql can be combined with other programs in a system script file to further automate the data summarization and analysis process.
- If a database must be maintained in two different formats (e.g., in PostgreSQL for ordinary use, but downloaded to SQLite for use when a network connection is not available), one script file can potentially be used to carry out exactly the same data selection and summarization operations on both formats of the database.
- The capabilities provided by some of execsql's metacommands surpass the features available in views or stored procedures in most DBMSs, and this additional functionality is only available when script files are used.
The capabilities provided by metacommands may, in some cases, allow a script designed for execsql to take the place of a custom database client program.
Documentation
One of the primary goals of execsql is to facilitate, and even encourage, comprehensive documentation of all actions taken upon a database. Two fundamental aspects of execsql that support this goal are:
- The use of script files, which require that SQL statements be saved in a file rather than executed interactively, and which also allow copious comments to be included; and
- Automatic logging of information about the database(s) used, the script file(s) run, and user choices in response to interactive prompts.
Other features of execsql that also support this goal are:
- The LOG metacommand, which writes a user-provided message to the standard log file;
- The WRITE metacommand, which makes it easy to issue progress and status messages to the console or to a file.
- The LOG_WRITE_MESSAGES metacommand, which automatically echoes all output of the WRITE metacommand to the standard log file;
- The TEE clause of the WRITE metacommand, which makes it easy to write progress and status messages to a custom documentation file in addition to the console;
- The $RUN_ID system variable, which can be written into a custom documentation file to establish a correspondence between the information in that file and the information in the standard log file;
- Other system variables such as $CURRENT_DATABASE, $DB_NAME, $CURRENT_DIR, $CURRENT_SCRIPT, $CURRENT_TIME, $LAST_ROWCOUNT, $LAST_SQL, and $USER, which provide useful contextual and status information that can be written into a custom documentation file;
- The TXT output format of the EXPORT metacommand, which displays (or writes to a file) a table or query in the format of a Markdown pipe table, which is an inherently readable format if included in a custom documentation file;
- The CONSOLE SAVE metacommand, which allows the entire contents of a GUI console window to be written to a custom documentation file; and
- The $DATE_TAG, $DATETIME_TAG, and $RUN_ID system variables, which can be used to construct file names for custom documentation files.
Using these features when writing script files allows easy generation of documentation that can be valuable for establishing exactly what, and how, changes were made to a database.
As an alternative to writing documentation to a text file, documentation could be saved to a database that serves as an activity log. Example 20 illustrates how this can be done for data issues, and a similar technique can be used to record ordinary progress and status information.
Examples
Examples
- Example 1: Use temporary queries in Access
- Example 2: Execute QA queries and write/export the results
- Example 3: Execute QA queries and prompt/display the results
- Example 4: Include a script file if it exists
- Example 5: Include a script file if a table exists
- Example 6: Looping
- Example 7: Nested variable evaluation
- Example 8: Prompt the user to choose an option
- Example 9: Using command-line substitution variables
- Example 10: Using CANCEL_HALT to control looping with dialogs
- Example 11: Output numbering with counters
- Example 12: Customize the table structure for data to be imported
- Example 13: Import all the CSV files in a directory
- Example 14: Run a script from a library database
- Example 15: Prompt for multiple values
- Example 16: Evaluating complex expressions with substitution variables
- Example 17: Displaying summary and detailed information
- Example 18: Creating a simple entry form
- Example 19: Dynamically altering a table structure to fit data
- Example 20: Logging data quality issues
- Example 21: Updating multiple databases with a cross-database transaction
The following examples illustrate some of the features of execsql.
Example 1: Use Temporary Queries to Select and Summarize Data in Access
This example illustrates a script that makes use of several temporary queries to select and summarize data, and a final query that prepares the data for export or further use. The SQL in this example is specific to MS-Access.
During the execution of this script with Access, the temporary queries will be created in the database. When the script concludes, the temporary queries will be removed. Nothing except the data itself need be kept in the database to use a script like this one.
Example 2: Execute a Set of QA Queries and Capture the Results
This example illustrates a script that creates several temporary queries to check the codes that are used in a set of staging tables against the appropriate dictionary tables, and, if there are unrecognized codes, writes them out to a text file.
Example 3: Execute a Set of QA Queries and Display the Results with a Prompt
This example illustrates a script that compiles the results of several QA queries into a single temporary table, then displays the temporary table if it has any rows (i.e., any errors were found), and prompts the user to cancel or continue the script.
Example 4: Include a File if it Exists
This example illustrates how a script file can be modified by inclusion of an additional script only if that script file exists. This might be used when a general-purpose script is used to process data sets, and when some special data-set-specific processing is needed, that processing is coded in a separate script file, which is read into the main script only if it exists.
Each data set to be processed is identified by a unique name, which is defined with a SUB command in a script that is also read into the main script. The definition of the data set name might look like this, in a file named ds_name.sql:
The main script then would look like this:
Example 5: Include a File if a Table Exists
Similar to Example 4, this example illustrates how a script file can be included if a database table exists. This might be used when carrying out quality assurance checks of data sets that have optional components. In this case, if an optional component has been loaded into a staging table, the script to check that component will be included.
Example 6: Looping
Although execsql does not have any metacommands specifically for looping through groups of SQL statements or metacommands, the IF metacommand can be used with either the INCLUDE or EXECUTE SCRIPT metacommands to perform looping. Commands to be executed within a loop must be in a separate script (either in a separate file if the INCLUDE metacommand is used, or in a script block defined with the BEGIN/END SCRIPT metacommands), and that script should end with another INCLUDE or EXECUTE SCRIPT metacommand to continue the loop, or should forego re-running itself again to exit the loop.
This is an old hack, but it is simple and effective. Either a single-line IF metacommand can be used, as shown here, or the script's recursive invocation of itself can be contained within a block IF statement.
A script to control a loop would invoke the inner loop script as follows:
In this example, the inner part of the loop is contained in a script file named loop_inner.sql. The inner loop script should have a structure like:
Termination of the loop may be controlled by some data condition instead of by an interactive prompt to the user. For example, you could loop for as many times as there are rows in a table by using the SUBDATA metacommand to get a count of all of the rows in a table, and then use the IF(EQUALS()) conditional test to terminate the loop when a counter variable equals the number of rows in the table.
Every loop iteration increases the size of the script in memory, so execsql deallocates the memory used for script commands that have already been executed, to minimize the possibility of an out-of-memory error.
Example 7: Nested Variable Evaluation
This example illustrates nested evaluation of substitution variables, using scripts that print out all of the substitution variables that are assigned with the "-a" command-line option.
Because there may be an indefinite number of command-line variable assignments, a looping technique is used to evaluate them all. The outer level script that initiates the loop is simply:
The script that is called, arg_vars_loop.sql, is:
On line 3 of this script the substitution variable argvar is first evaluated to generate a name for a command-line variable, consuming the inner pair of exclamation points. The resulting variable (which will take on values of "$ARG_1", "$ARG_2", etc.) will then be evaluated, yielding the value of the command-line variable assignment.
Example 8: Prompt the User to Choose an Option
This example illustrates how the PROMPT SELECT_SUB metacommand can be used to prompt the user to select among several options. In this example, the options allow the user to choose a format in which to (export and) view a data table or view. For this example, there must be a data table or view in the database named some_data.
This example also illustrates that, because the text ("txt") output format of the EXPORT metacommand creates a Markdown-compatible table, this type of text output can be combined with output of WRITE metacommands and converted to Portable Document Format (PDF). This example also illustrates how the SYSTEM_CMD metacommand can be used to immediately open and display a data file that was just exported. (Note that the xdg-open command is available in most Linux desktop environments. In Windows, the start command is equivalent.)
This example also illustrates how substitution variables can be used to parameterize code to support modularization and code re-use. In this example the substitution variable data_table is assigned a value at the beginning of the script. Alternatively, this variable might be assigned different values at different locations in a main script, and the commands in the remainder of this example placed in a second script that is INCLUDEd where appropriate to allow the export and display of several different data tables or views. Example 10 illustrates this usage.
Example 9. Using Command-Line Substitution Variables
This example illustrates how substitution variables that are assigned on the command line using the "-a" option can be used in a script.
This example presumes the existence of a SQLite database named todo.db, and a table in that database named todo with columns named todo and date_due. The following script allows a to-do item to be added to the database by specifying the text of the to-do item and its due date on the command line:
This script can be used with a command line like:
Example 10. Using CANCEL_HALT to Control Looping with Dialogs
This example illustrates the use of the CANCEL_HALT metacommand during user interaction with dialogs. Ordinarily when a user presses the "Cancel" button on a dialog, execsql treats this as an indication that a necessary response was not received, and that further script processing could have adverse consequences—and therefore execsql halts script processing. However, there are certain cases when the "Cancel" button is appropriately used to terminate a user interaction without stopping script processing.
The scripts in this example presents the user with a list of all views in the database, allows the user to select one, and then prompts the user to choose how to see the data. Three scripts are used:
- view_views.sql: This is the initial script that starts the process. It turns the CANCEL_HALT flag off at the start of the process, and turns it back on again at the end.
- view_views2.sql: This script is included by view_views.sql, and acts as an inner loop, repeatedly presenting the user with a list of all the views in the database. The "Cancel" button on this dialog is used to terminate the overall process. If the user selects a view, rather than canceling the process, then the choose_view.sql script is INCLUDEd to allow the user to choose how to see the data.
- choose_view.sql: This script presents the dialog that allows the user to choose how to see the data from the selected view. This is the same script used in Example 8, except that the data_table variable is defined in the view_views2.sql script instead of in choose_view.sql.
view_views.sql
view_views2.sql
The choose_view.sql script can be seen in Example 8.
The CONTINUE keyword of the PROMPT SELECT_SUB metacommand can also be used to close the dialog without canceling the script.
Example 11. Output Numbering with Counters
This example illustrates how counter variables can be used to automatically number items. This example shows automatic numbering of components of a Markdown document, but the technique can also be used to number database objects such as tables and views.
This example creates a report of the results of a set of QA checks, where the information about the checks is contained in a table with the following columns:
- check_number: An integer that uniquely identifies each QA check that is conducted.
- test_description: A narrative description of the scope or operation of the check.
- comment_text: A narrative description of the results of the check.
The results of each check are also represented by tabular output that is saved in a table named qa_tbl_x where x is the check number.
A script like this one could be INCLUDEd as many times as there are sets of QA results to report.
This example also illustrates how the value of a counter variable can be preserved for repeated use by assigning it to a user-defined substitution variable.
Example 12. Customize the Table Structure for Data to be Imported
This example illustrates how the structure of a table that would be created by the IMPORT metacommand can be customized during the import process. Customization may be necessary because the data types that are automatically selected for the columns of the new table need to be modified. This may occur when:
- A column is entirely null. In this case, execsql will create the column with a text data type, whereas a different data type may be more appropriate.
- A column contains only integers of 1 and 0; execsql will create this column with a Boolean data type, whereas an integer type may be more appropriate.
- A column contains only integers, whereas a floating-point type may be more appropriate.
The technique shown here first writes the CREATE TABLE statement to a temporary file, and then opens that file in an editor so that you can make changes. After the file is edited and closed, the file is INCLUDEd to create the table structure, and then the data are loaded into that table.
Changes to data types that are incompatible with the data to be loaded will result in an error during the import process. Changes to column names will also prevent the data from being imported.
Although this example shows this process applied to only a single file/table, multiple CREATE TABLE statements can be written into a single file and edited all at once.
This example illustrates the use of a temporary file for the CREATE TABLE statement, although you may wish to save the edited form of this statement in a permanent file to keep a record of all data-handling operations.
Example 13. Import All the CSV Files in a Directory
When a group of related data files are to be loaded together into a database, they can all be loaded automatically with this script if they are first placed in the same directory. This example script operates by:
- Prompting for the directory containing the CSV files to load.
- Creating a text file with the names of all of the CSV files in that directory.
- Importing the text file into a database table.
- Adding columns to that table for the name of the table into which each CSV file is imported and the time of import. The main file name of each CSV file is used as the table name.
- Looping over the list of CSV files, choosing one that does not have an import time, importing that file, and setting the import time.
This process uses two script files. The first one obtains the list of CSV files, and the second one acts as the inner part of the loop, repeatedly loading a single CSV file. The main script looks like this:
The second script, which must be named import1.sql, in accordance with the reference to it in the first script, looks like this:
This example is designed to run on a Linux system with PostgreSQL, but the technique can be applied in other environments and with other DBMSs.
Example 14. Run a Script from a Library Database
Despite the advantages of storing scripts on the file system, in some cases storing a set of scripts in a library database may be appropriate. Consider a table named scriptlib that is used to store SQL scripts, and that has the following columns:
- script_name: A name used as a unique identifier for each script; the primary key of the table.
- script_group: A name used to associate related scripts.
- group_master: A Boolean used to flag the first script in a group that is to be executed.
- script_text: The text of the SQL script.
A single script from such a library database can be run using another script like the following:
This technique could be combined with a prompt for the script to run, using the method illustrated in Example 8, to create a tool that allows interactive selection and execution of SQL scripts.
This technique can be extended to export all scripts with the same script_group value, and then to run the master script for that group. To use this approach, the filename used with the IMPORT metacommand in each script must be a substitution variable that is to be replaced with the name of a temporary file created with the SUB_TEMPFILE metacommand.
Example 15: Prompt for Multiple Values
The PROMPT SELECT_SUB metacommand allows the selection of only one row of data at a time. Multiple selections can be obtained, however, by using the PROMPT SELECT_SUB metacommand in a loop and accumulating the results in another variable or variables.
This example illustrates that process, using a main script that INCLUDEs another script, choose2.sql, to present the prompt and accumulate the choices in the desired form.
The main script looks like this:
The script that is included choose2.sql, looks like this:
In this example, only one value from each of the multiple selections is accumulated into a single string in the form of a list of SQL character values suitable for use with the SQL in operator. The multiple values could also be accumulated in the form of a values list, if appropriate to the intended use.
Another approach to handling multiple selection is to reassign each selected value to another substitution variable that has a name that is dynamically created using a counter variable, as shown in the following snippet.
Example 16: Evaluating Complex Expressions with Substitution Variables
Although execsql does not itself process mathematical expressions or other similar operations on substitution variables, all of the functions of SQL and the underlying DBMS can be used to evaluate complex expressions that use substitution variables. For example:
This will assign the result of the expression to the substitution variable "sum". Any mathematical, string, date, or other functions supported by the DBMS in use can be applied to substitution variables in this way.
Example 17: Displaying Summary and Detailed Information
A set of QA checks performed on data may be summarized as a list of all of the checks that failed; however, there may also be detailed information about those results that the user would like to see—such as a list of all the data rows that failed. Assuming that a view has been created for each QA check, and that the QA check failures have been compiled into a table of this form (see also Example 3):
The detail_view column should contain the name of the view with detailed information about the QA failure. Both the summary and detailed information can be presented to the viewer using the following statements in the main script:
Where the qa_detail.sql script is as follows:
The user can cancel further script processing using the "Cancel" button on either the summary dialog box or any of the detail displays. If the "Continue" button is chosen on the summary dialog box, script processing will resume.
Example 18: Creating a Simple Entry Form
This example illustrates the creation of a simple data entry form using the PROMPT ENTRY_FORM metacommand. In this example, the form is used to get a numeric value and a recognized set of units for that value, and then display that value converted to all compatible units in the database.
This example relies on the presence of a unit dictionary in the database (e_unit) that contains the unit code, the dimenension of that unit, and a conversion factor to convert values to a standard unit for the dimension. This example uses two scripts, named unit_conv.sql and unit_conv2.sql, the first of which INCLUDEs the second.
unit_conv.sql
Note that to include a decimal point in the regular expressions for the numeric value, the decimal point must be escaped twice: once for SQL, and once for the regular expression itself. Also note that in this case, the validation_regex and the validation_key_regex are identical except that all subexpressions in the latter are optional. If the first digit character class were not optional, then at least one digit would always be required, and entry of a leading negative sign would not be possible (though a negative sign could be added after at least one digit was entered).
unit_conv2.sql
The unit_conv2.sql script will continue to display conversions for as long as either the value or the unit is changed.
Example 19: Dynamically Altering a Table Structure to Fit Data
Example contributed by E. Shea.
This example illustrates automatic (scripted) revisions to a table structure, wherein a number of additional columns are added to a table; the number of columns added is determined by the data. The case illustrated here is of a Postgres table containing a PostGIS multipoint geometry column. The purpose of this script is to extract the coordinate points from the geometry column and store each point as a pair of columns containing latitude and longitude values. The number of points in the multipoint column varies from row to row, and the maximum number of points across all rows is not known (and need not be known) when this script is run.
This example assumes the existence of a database table named sample_points that contains the following two columns:
- sample: a text value uniquely identifying each row.
- locations: a multi-point PostGIS geometry value.
This operation is carried out using two scripts, named expand_geom.sql and expand_geom2.sql. The first of these calls the second. Looping and a counter variable are used to create and fill as many additional columns as are needed to contain all the point coordinates.
expand_geom.sql
expand_geom2.sql
Example 20: Logging Data Quality Issues
This example illustrates how data quality issues that are encountered during data loading or cleaning operations can be logged for later evaluation and resolution. Issues are logged in a SQLite database in the working directory. This database is named issue_log.sqlite and is automatically created if necessary. The database contains one table named issue_log in which all issues are recorded. The issue log database may also contain additional tables that provide data to illustrate the issues. Each of these additional tables has a name that starts with "issue_", followed by an automatically-assigned issue number.
The script that logs the issues is named log_issue.sql. It should be included in the main script at every point where an issue is to be logged. Three substitution variables are used to pass information to this script:
- dataset: The name of the data set to which this issue applies.
- issue: A description of the issue.
- issue_data: The name of a table or view containing a data summary that illustrates the issue. This substitution variable need not be defined if no illustrative data are necessary or applicable (use the RM_SUB metacommand to un-define this variable if it has been previously used).
Each issue is logged only once. The issue_log table is created with additional columns that may be used to record the resolution of each issue, and these are not overwritten if an issue is encountered repeatedly.
The log_issue.sql script uses several substitution variables with names that start with "iss_", and uses counter variable 15503. Other parts of the loading and cleaning scripts should avoid collisions with these values.
log_issue.sql
This script would be used during a data loading or cleaing process as illustrated in the following code snippet.
Example 21: Updating Multiple Databases with a Cross-Database Transaction
This example illustrates how the same SQL script can be applied to multiple databases, and the changes committed only if they were successful for all databases. This makes use of the looping technique illustrated in Example 6, but using sub-scripts defined with BEGIN/END SCRIPT metacommands instead of INCLUDE metacommands. The same approach, of committing changes only if there were no errors in any database, could also be done without looping, simply by unrolling the loops to apply the updates to each database in turn. This latter approach would be necessary when different changes were to be made to each database—though even in that case, the commit statements could all be executed in a loop.
Copyright and License
Copyright (c) 2007, 2008, 2009, 2014, 2015, 2016, 2017, R.Dreas Nielsen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.