Welcome to dbio’s documentation!¶

Operations¶

Note: these functions are aliased and can be called directly from the dbio module, e.g. dbio.replicate().

dbio.io.load(sqla_url, table, filename, append, disable_indices=False, analyze=False, csv_params={'delimiter': ', ', 'escapechar': '\\', 'quoting': 3, 'lineterminator': '\n', 'encoding': 'utf-8'}, null_string='NULL', create_staging=True, expected_rowcount=None, **kwargs)¶

Import data from a csv file to a database table.

param sqla_url: SQLAlchemy url string to pass to create_engine().

param table: Table in database to load data from filename.

param filename: Name of csv file to load from.

param append: If True, any data already in the table will be preserved.

param analyze: If True, the table will be will be analyzed for query optimization immediately after importing.

param disable_indices:

If True, table will temporarily disable or drop indices for the duration of the load in the attempt of speeding up the operation.

param csv_params:

Dictionary of csv parameters.

param null_string:

String to represent null values with.

param create_staging:

If True, the old table will be replaced with a new, identical table. If False, there must be an existing table named “table_staging”.

param expected_rowcount:

The number of rows that are expected to be in the loaded table. If the count does not much, the loading transaction will raise an error and rollback if possible. If the count is set to None, no check will be made.

Kwargs:

direct (string): For Vertica. Will apply DIRECT keywprd to COPY command to skip WOS

dbio.io.query(sqla_url, query, filename, query_is_file=False, batch_size=1000000, csv_params={'delimiter': ', ', 'escapechar': '\\', 'quoting': 3, 'lineterminator': '\n', 'encoding': 'utf-8'}, null_string='NULL')¶

Query a database and write the results to a csv file.

Parameters:

sqla_url – SQLAlchemy engine creation URL for db.
query – SQL query string to execute.
filename – Name of csv file to dump to.
query_is_file – If True, the query argument is a filename.
batch_size – Number of rows to keep in memory before writing to filename.
csv_params – Dictionary of csv parameters.
null_string – String to represent null values with.

Returns:

The number of rows written to the file.

dbio.io.replicate(query_db_url, load_db_url, query, table, append, analyze=False, disable_indices=False, query_is_file=False, create_staging=True, do_rowcount_check=False, **kwargs)¶

Load query results into a table using a named pipe to stream the data.

This method works by simultaneously executing query() and load() with a named pipe shared between the two processes.

Unix only.

param query_db_url:

SQLAlchemy engine creation URL for query_db.

param load_db_url:

SQLAlchemy engine creation URL for load_db.

param query: SQL query string to execute.

param table: Table in database to load data from filename.

param append: If True, any data already in the table will be preserved.

param analyze: If True, the table will be will be analyzed for query optimization immediately after importing.

param disable_indices:

If True, table will temporarily disable or drop indices for the duration of the load in the attempt of speeding up the operation.

param query_is_file:

If True, the query argument is a filename.

param create_staging:

If True, the old table will be replaced with a new, identical table. If False, there must be an existing table named “table_staging”.

param do_rowcount_check:

If True, the replication will only succeed if the query rowcount matches the load rowcount.

Kwargs:

direct (string): For Vertica. Will apply DIRECT keywprd to COPY command to skip WOS

raises ReaderError:

Reader process did not execute successfully.

raises WriterError:

Writer process did not execute successfully.

dbio.io.replicate_no_fifo(query_db_url, load_db_url, query, table, append, analyze=False, disable_indices=False, query_is_file=False, create_staging=True, do_rowcount_check=False, **kwargs)¶: Identitcal to replicate(), but uses a tempfile and disk I/O instead of a named pipe. This method works on any platform and doesn’t require the database to support loading from named pipes.

Database Base Classes¶

To add support for an additional database, extend at least one of the following classes.

class dbio.databases.base.Exportable(url)¶

Designed to be the target of query operations.

get_export_engine()¶

Returns:	sqlalchemy engine object.

get_query_rowcount(query)¶

Gets a row count for the given query.

Parameters:	query – The query to get the row count for.
Returns:	The row count for the provided query.

class dbio.databases.base.Importable(url)¶

Designed to be the target of load operations.

do_rowcount_check(table, expected_rowcount)¶

Checks if the given table has the expected row count.

Parameters:	table – The table to check for row count. expected_rowcount – Number of rows to expect in the table.
Raises:	UnexpectedRowcountError upon row count mismatch

execute_import(table, filename, append, csv_params, null_string, analyze=False, disable_indices=False, create_staging=True, expected_rowcount=None, **kwargs)¶

Database specific implementation of loading from a CSV

Parameters:

table – destination for the load operation.
data_source – Either a CSV file to be loaded or an INSERT command.
csv_params – csv format info of the data_source.
append – True if the data_source should add to table, False if table should only contain the contents of data_source after loading.
analyze – If True, the table will be will be analyzed for query optimization immediately after importing.
disable_indices – If True, table will temporarily disable or drop indices in the attempts of speeding up the load.
null_string – String to replace NULL values with when importing.
create_staging – If True, the old table will be replaced with a new, identical table. If False, there must be an existing table named “table_staging”.
expected_rowcount – The number of rows that are expected to be in the loaded table. If the count does not much, the loading transaction will raise an error and rollback if possible. If the count is set to None, no check will be made.

get_import_engine()¶

Returns:	sqlalchemy engine object.

Welcome to dbio’s documentation!¶

Operations¶

Database Base Classes¶

Indices and tables¶

Table Of Contents

This Page

param disable_indices:
param sqla_url:	SQLAlchemy url string to pass to create_engine().
param table:	Table in database to load data from filename.
param filename:	Name of csv file to load from.
param append:	If True, any data already in the table will be preserved.
param analyze:	If True, the table will be will be analyzed for query optimization immediately after importing.
	If True, table will temporarily disable or drop indices for the duration of the load in the attempt of speeding up the operation.
param csv_params:
	Dictionary of csv parameters.
param null_string:
	String to represent null values with.
param create_staging:
	If True, the old table will be replaced with a new, identical table. If False, there must be an existing table named “table_staging”.
param expected_rowcount:
	The number of rows that are expected to be in the loaded table. If the count does not much, the loading transaction will raise an error and rollback if possible. If the count is set to None, no check will be made.

param query_db_url:
	SQLAlchemy engine creation URL for query_db.
param load_db_url:
	SQLAlchemy engine creation URL for load_db.
param query:	SQL query string to execute.
param table:	Table in database to load data from filename.
param append:	If True, any data already in the table will be preserved.
param analyze:	If True, the table will be will be analyzed for query optimization immediately after importing.
param disable_indices:
	If True, table will temporarily disable or drop indices for the duration of the load in the attempt of speeding up the operation.
param query_is_file:
	If True, the query argument is a filename.
param create_staging:
	If True, the old table will be replaced with a new, identical table. If False, there must be an existing table named “table_staging”.
param do_rowcount_check:
	If True, the replication will only succeed if the query rowcount matches the load rowcount.

raises ReaderError:
	Reader process did not execute successfully.
raises WriterError:
	Writer process did not execute successfully.

Navigation

Welcome to dbio’s documentation!¶

Operations¶

Database Base Classes¶

Indices and tables¶

Table Of Contents

This Page

Quick search

Navigation