IdaDataBase¶
Connect to dashDB/DB2¶
-
class
ibmdbpy.base.
IdaDataBase
(dsn, uid=u'', pwd=u'', autocommit=True, verbose=False)[source]¶ An IdaDataBase instance represents a reference to a remote dashDB/DB2 instance. This is an abstraction layer for the remote connection. The IdaDataBase interface provides several functions that enable basic database administration in pythonic syntax.
You can use either ODBC or JDBC to connect to the database. The default connection type is ODBC, which is the standard connection type for Windows users. To establish an ODBC connection, download an IBM DB2 driver and set up your ODBC connection by specifying your connection protocol, port, and hostname. An ODBC connection on Linux or Mac might require more settings. For more information about how to establish an ODBC connection, see the pypyodbc documentation.
To connect with JDBC, install the optional external package jaydebeapi, download the ibm jdbc driver, and save it in your local ibmdbpy folder. If you put the jdbc driver in the CLASSPATH variable or the folder that contains it, it will work too. A C++ compiler adapted to the current python version, operating system, and architecture may also be required to install jaydebeapi.
The instantiation of an IdaDataBase object is a mandatory step before creating IdaDataFrame objects because IdaDataFrames require an IdaDataBase as a parameter to be initialized. By convention, use only one instance of IdaDataBase per database. However, you can use several instances of IdaDataFrame per connection.
-
__init__
(dsn, uid=u'', pwd=u'', autocommit=True, verbose=False)[source]¶ Open a database connection.
Parameters: dsn : str
Data Source Name (as specified in your ODBC settings) or JDBC URL string.
uid : str, optional
User ID.
pwd : str, optional
User password.
autocommit : bool, default: True
If True, automatically commits all operations.
verbose : bool, defaukt: True
If True, prints all SQL requests that are sent to the database.
Returns: IdaDataBase object
Raises: ImportError
JayDeBeApi is not installed.
IdaDataBaseError
- uid and pwd are defined both in uid, pwd parameters and dsn.
- The ‘db2jcc4.jar’ file is not in the ibmdbpy site-package repository.
Examples
ODBC connection, userID and password are stored in ODBC settings:
>>> IdaDataBase(dsn="DASHDB") # ODBC Connection <ibmdbpy.base.IdaDataBase at 0x9bec860>
ODBC connection, userID and password are not stored in ODBC settings:
>>> IdaDataBase(dsn="DASHDB", uid="<UID>", pwd="<PWD>") <ibmdbpy.base.IdaDataBase at 0x9bec860>
JDBC connection, full JDBC string:
>>> jdbc='jdbc:db2://<HOST>:<PORT>/<DBNAME>:user=<UID>;password=<PWD>' >>> IdaDataBase(dsn=jdbc) <ibmdbpy.base.IdaDataBase at 0x9bec860>
JDBC connectiom, JDBC string and seperate userID and password:
>>> jdbc = 'jdbc:db2://<HOST>:<PORT>/<DBNAME>' >>> IdaDataBase(dsn=jdbc, uid="<UID>", pwd="<PWD>") <ibmdbpy.base.IdaDataBase at 0x9bec860>
Attributes
data_source_name (str) Name of the referring DataBase. _con_type (str) Type of the connection, either ‘odbc’ or ‘jdbc’. _connection_string (str) Connection string use for connecting via ODBC or JDBC. _con (connection object) Connection object to the remote Database. _idadfs (list) List of IdaDataFrame objects opened under this connection.
-
Methods
DataBase Exploration¶
current_schema¶
show_tables¶
-
IdaDataBase.
show_tables
(show_all=False)[source]¶ Show tables and views that are available in self. By default, this function shows only tables that belong to a user’s specific schema.
Parameters: show_all : bool
If True, all table and view names in the database are returned, not only those that belong to the user’s schema.
Returns: DataFrame
A data frame containing tables and views names in self with some additional information (TABSCHEMA, TABNAME, OWNER, TYPE).
Notes
show_tables implements a cache strategy. The cache is stored when the user calls the method with the argument show_all set to True. This improves performance because database table look ups are a very common operation. The cache gets updated each time a table or view is created or refreshed, each time a table or view is deleted, or when a new IdaDataFrame is opened.
Examples
>>> ida_db.show_tables() TABSCHEMA TABNAME OWNER TYPE 0 DASHXXXXXX SWISS DASHXXXXXX T 1 DASHXXXXXX IRIS DASHXXXXXX T 2 DASHXXXXXX VIEW_TITANIC DASHXXXXXX V ... >>> ida_db.show_tables(show_all = True) TABSCHEMA TABNAME OWNER TYPE 0 DASHXXXXXX SWISS DASHXXXXXX T 1 DASHXXXXXX IRIS DASHXXXXXX T 2 DASHXXXXXX VIEW_TITANIC DASHXXXXXX V 2 SYSTOOLS IDAX_MODELS DASH101631 A ...
show_models¶
-
IdaDataBase.
show_models
()[source]¶ Show models that are available in the database.
Returns: DataFrame Examples
>>> idadb.show_models() MODELSCHEMA MODELNAME OWNER 0 DASHXXXXXX KMEANS_10857_1434974511 DASHXXXXXX 1 DASHXXXXXX KMEANS_11726_1434977692 DASHXXXXXX 2 DASHXXXXXX KMEANS_11948_1434976568 DASHXXXXXX
exists_table_or_view¶
-
IdaDataBase.
exists_table_or_view
(objectname)[source]¶ Check if a table or view exists in self.
Parameters: objectname : str
Name of the table or view to check.
Returns: bool
Raises: TypeError
The object exists but is not of the expected type.
Examples
>>> idadb.exists_table_or_view("NOT_EXISTING") False >>> idadb.exists_table_or_view("TABLE_OR_VIEW") True >>> idadb.exists_table_or_view("NO_TABLE_NOR_VIEW") TypeError : "NO_TABLE_NOR_VIEW" exists in schema '?' but of type '?'
exists_table¶
-
IdaDataBase.
exists_table
(tablename)[source]¶ Check if a table exists in self.
Returns: bool
Raises: TypeError
The object exists but is not of the expected type.
Examples
>>> idadb.exists_table("NOT_EXISTING") False >>> idadb.exists_table("TABLE") True >>> idadb.exists_table("NO_TABLE") TypeError : "tablename" exists in schema "?" but of type '?'
exists_view¶
-
IdaDataBase.
exists_view
(viewname)[source]¶ Check if a view exists in self.
Parameters: viewname : str
Name of the view to check.
Returns: bool
Raises: TypeError
The object exists but is not of the expected type.
Examples
>>> idadb.exists_view("NOT_EXISTING") False >>> idadb.exists_view("VIEW") True >>> idadb.exists_view("NO_VIEW") TypeError : "viewname" exists in schema "?" but of type '?'
exists_model¶
-
IdaDataBase.
exists_model
(modelname)[source]¶ Check if a model exists in self.
Parameters: modelname : str
Name of the model to check.
Returns: bool
Raises: TypeError
The object exists but is not of the expected type.
Examples
>>> idadb.exists_model("MODEL") True >>> idadb.exists_model("NOT_EXISTING") False >>> idadb.exists_model("NO_MODEL") TypeError : NO_TABLE exists but is not a model (of type '?')
is_table_or_view¶
-
IdaDataBase.
is_table_or_view
(objectname)[source]¶ Check if an object is a table or a view in self.
Parameters: objectname : str
Name of the object to check.
Returns: bool
Raises: ValueError
objectname doesn’t exist in the database.
Examples
>>> idadb.is_table_or_view("NO_TABLE") False >>> idadb.is_table_or_view("TABLE") True >>> idadb.is_table_or_view("NOT_EXISTING") ValueError : NO_EXISTING does not exist in database
is_table¶
-
IdaDataBase.
is_table
(tablename)[source]¶ Check if an object is a table in self.
Parameters: tablename : str
Name of the table to check.
Returns: bool
Raises: ValueError
The object doesn’t exist in the database.
Examples
>>> idadb.is_table("NO_TABLE") False >>> idadb.is_table("TABLE") True >>> idadb.is_table("NOT_EXISTING") ValueError : NO_EXISTING does not exist in database
is_view¶
-
IdaDataBase.
is_view
(viewname)[source]¶ Check if an object is a view in self.
Parameters: viewname : str
Name of the view to check.
Returns: bool
Raises: ValueError
The object doesn’t exist in the database.
Examples
>>> idadb.is_view("NO_VIEW") False >>> idadb.is_view("VIEW") True >>> idadb.is_view("NOT_EXISTING") ValueError : NO_EXISTING does not exist in database
is_model¶
-
IdaDataBase.
is_model
(modelname)[source]¶ Check if an object is a model in self.
Parameters: modelname : str
Name of the model to check.
Returns: bool
Raises: ValueError
The object doesn’t exist in the database.
Examples
>>> idadb.is_model("MODEL") True >>> idadb.is_model("NO_MODEL") False >>> idadb.is_model("NOT_EXISTING") ValueError : NO_EXISTING doesn't exist in database
ida_query¶
-
IdaDataBase.
ida_query
(query, silent=False, first_row_only=False, autocommit=False)[source]¶ Prepare, execute and format the result of a query in a dataframe or in a Tuple. If nothing is expected to be returned for the SQL command, nothing is returned.
Parameters: query : str
Query to be executed.
silent: bool, default: False
If True, the query is not printed in the python console even if the verbosity mode is activated (VERBOSE environment variable is equal to “True”).
first_row_only : bool, default: False
If True, only the first row of the result is returned as a Tuple.
autocommit: bool, default: False
If True, the autocommit function is available.
Returns: DataFrame or Tuple (if first_row_only=False)
Notes
If first_row_only argument is True, then even if the actual result of the query is composed of several rows, only the first row will be returned.
Examples
>>> idadb.ida_query("SELECT * FROM IRIS FETCH FIRST 5 ROWS ONLY") sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa
>>> idadb.ida_query("SELECT COUNT(*) FROM IRIS") (150, 150, 150, 150)
ida_scalar_query¶
-
IdaDataBase.
ida_scalar_query
(query, silent=False, autocommit=False)[source]¶ Prepare and execute a query and return only the first element as a string. If nothing is returned from the SQL query, an error occur.
Parameters: query : str
Query to be executed.
silent: bool, default: False
If True, the query will not be printed in python console even if verbosity mode is activated.
autocommit: bool, default: False
If True, the autocommit function is available.
Returns: str or Number
Notes
Even if the actual result of the query is composed of several columns and several rows, only the first element (top-left) will be returned.
Examples
>>> idadb.ida_scalar_query("SELECT TRIM(CURRENT_SCHEMA) from SYSIBM.SYSDUMMY1") 'DASHXXXXX'
Upload DataFrames¶
as_idadataframe¶
-
IdaDataBase.
as_idadataframe
(*args, **kwds)[source]¶ Upload a dataframe and return its corresponding IdaDataFrame. The target table (tablename) will be created or replaced if the option clear_existing is set to True.
To add data to an existing tables, see IdaDataBase.append
Parameters: dataframe : DataFrame
Data to be uploaded, contained in a Pandas DataFrame.
tablename : str, optional
Name to be given to the table created in the database. If not given, a valid tablename is generated (for example, DATA_FRAME_X where X a random number).
clear_existing : bool
If set to True, a table will be replaced when a table with the same name already exists in the database.
primary_key : str
Name of a column to be used as primary key.
Returns: IdaDataFrame
Raises: TypeError
- Argument dataframe is not of type pandas.DataFrame.
- The primary key argument is not a string.
NameError
- The name already exists in the database and clear_existing is False.
- The primary key argument doesn’t correspond to a column.
PrimaryKeyError
The primary key contains non unique values.
Notes
This function is not intended to be used to add data to an existing table, rather to create a new table from a dataframe. To add data to an existing table, please consider using IdaDataBase.append
Examples
>>> from ibmdbpy.sampledata.iris import iris >>> idadb.as_idadataframe(iris, "IRIS") <ibmdbpy.frame.IdaDataFrame at 0xb34a898> >>> idadb.as_idadataframe(iris, "IRIS") NameError: IRIS already exists, choose a different name or use clear_existing option. >>> idadb.as_idadataframe(iris, "IRIS2") <ibmdbpy.frame.IdaDataFrame at 0xb375940> >>> idadb.as_idadataframe(iris, "IRIS", clear_existing = True) <ibmdbpy.frame.IdaDataFrame at 0xb371cf8>
Delete DataBase Objects¶
drop_table¶
-
IdaDataBase.
drop_table
(tablename)[source]¶ Drop a table in the database.
Parameters: tablename : str
Name of the table to drop.
Raises: ValueError
If the object does not exist.
TypeError
If the object is not a table.
Notes
This operation cannot be undone if autocommit mode is activated.
Examples
>>> idadb.drop_table("TABLE") True >>> idadb.drop_table("NO_TABLE") TypeError : NO_TABLE exists in schema '?' but of type '?' >>> idadb.drop_table("NOT_EXISTING") ValueError : NO_EXISTING doesn't exist in database
drop_view¶
-
IdaDataBase.
drop_view
(viewname)[source]¶ Drop a view in the database.
Parameters: viewname : str
Name of the view to drop.
Raises: ValueError
If the object does not exist.
TypeError
If the object is not a view.
Notes
This operation cannot be undone if autocommit mode is activated.
Examples
>>> idadb.drop_view("VIEW") True >>> idadb.drop_view("NO_VIEW") TypeError : NO_VIEW exists in schema '?' but of type '?' >>> idadb.drop_view("NOT_EXISTING") ValueError : NO_EXISTING doesn't exist in database
drop_model¶
-
IdaDataBase.
drop_model
(modelname)[source]¶ Drop a model in the database.
Parameters: modelname : str
Name of the model to drop.
Raises: ValueError
If the object does not exist.
TypeError
if the object exists but is not a model.
Notes
This operation cannot be undone if autocommit mode is activated.
Examples
>>> idadb.drop_model("MODEL") True >>> idadb.drop_model("NO_MODEL") TypeError : NO_MODEL exists in schema '?' but of type '?' >>> idadb.drop_model("NOT_EXISTING") ValueError : NOT_EXISTING does not exists in database
DataBase Modification¶
rename¶
-
IdaDataBase.
rename
(*args, **kwds)[source]¶ Rename a table referenced by an IdaDataFrame in dashDB/DB2.
Parameters: idadf : IdaDataFrame
IdaDataFrame object referencing the table to rename.
newname : str
Name to be given to self. Should contain only alphanumerical characters and underscores. All lower case characters will be converted to upper case characters. The new name should not already exist in the database.
Raises: ValueError
The new tablename is not valid.
TypeError
Rename function is supported only for table type.
NameError
The name of the object to be created is identical to an existing name.
Notes
Upper case characters and numbers, optionally separated by underscores “_”, are valid characters.
add_column_id¶
-
IdaDataBase.
add_column_id
(*args, **kwds)[source]¶ Add an ID column to an IdaDataFrame.
Raises: TypeError
idadf is not an IdaDataFrame.
ValueError
The given column name already exists in the DataBase.
Notes
The non-destructive creation of column IDs is not reliable, because row IDs are recalculated on the fly in a non-deterministic way. The only reliable way is to create it destructively, but the row IDs will be created at random. This could be highly improved in the future. An idea is to create ID columns in a non-destructive way and base them on the sorting of a set of columns, defined by the user, or all columns if no column combination results in unique identifiers.
This method is DEPRECATED.
delete_column¶
-
IdaDataBase.
delete_column
(idadf, column_name, destructive=False)[source]¶ Delete a column in an idaDataFrame.
Parameters: idadf : IdaDataFrame
The IdaDataframe in which a column should be deleted.
column_name : str
Name of the column to delete.
destructive : bool
If set to True, the column is deleted in the database. Otherwise, it is deleted virtually, creating a view for the IdaDataFrame.
Raises: TypeError
column_name should be a string.
ValueError
column_name refers to a column that doesn’t exist in self.
append¶
-
IdaDataBase.
append
(idadf, df, maxnrow=None)[source]¶ Append rows of a DataFrame to an IdaDataFrame. The DataFrame must have the same structure (same column names and datatypes). Optionally, the DataFrame to be added can be splitted into several chunks. This improves performance and prevents SQL overflows. By default, chunks are limited to 8000 cells.
Parameters: idadf : IdaDataFrame
IdaDataFrame that receives data from dataframe df.
df : DataFrame
Dataframe whose rows are added to IdaDataFrame idadf.
maxnrow : int, optional
number corresponding to the maximum number of rows for each chunks.
Raises: TypeError
- maxnrow should be an interger.
- Argument idadf should be an IdaDataFrame.
- Argument df should be a pandas DataFrame.
ValueErrpr
- maxnrow should be greater than 1 or nleft blank.
- Other should be a Pandas DataFrame.
- Other dataframe has not the same number of columns as self.
- Some columns in other have different names that are different from the names of the columns in self.
Private Methods¶
_exists¶
_is¶
_drop¶
_upper_columns¶
_get_name_and_schema¶
-
IdaDataBase.
_get_name_and_schema
(objectname)[source]¶ Helper function that returns the name and the schema from an object name. Implicitly, if no schema name was given, it is assumed that user refers to the current schema.
Returns: tuple
A tuple composed of 2 strings containing the schema and the name.
Examples
>>> _get_name_and_schema(SCHEMA.TABLE) (SCHEMA, TABLE) >>> _get_name_and_schema(TABLE) (<current schema>, TABLE)
_get_valid_tablename¶
-
IdaDataBase.
_get_valid_tablename
(prefix=u'DATA_FRAME_')[source]¶ Generate a valid database table name.
Parameters: prefix : str, default: “DATA_FRAME_“
Prefix used to create the table name. The name is constructed using this pattern : <prefix>_X where <prefix> corresponds to the string parameter “prefix” capitalized and X corresponds to a pseudo randomly generated number (0-100000).
Returns: str
Examples
>>> idadb._get_valid_tablename() 'DATA_FRAME_49537_1434978215' >>> idadb._get_valid_tablename("MYDATA_") 'MYDATA_65312_1434978215' >>> idadb._get_valid_tablename("mydata_") 'MYDATA_78425_1434978215' >>> idadb._get_valid_tablename("mydata$") ValueError: Table name is not valid, only alphanumeric characters and underscores are allowed.
_get_valid_viewname¶
-
IdaDataBase.
_get_valid_viewname
(prefix=u'VIEW_')[source]¶ Convenience function : Alternative name for get_valid_tablename.
The parameter prefix has its optional value changed to “VIEW_”.
Examples
>>> idadb._get_valid_viewname() 'VIEW_49537_1434978215' >>> idadb._get_valid_viewname("MYVIEW_") 'MYVIEW_65312_1434978215' >>> idadb._get_valid_viewname("myview_") 'MYVIEW_78425_1434978215' >>> idadb._get_valid_modelname("myview$") ValueError: View name is not valid, only alphanumeric characters and underscores are allowed.
_get_valid_modelname¶
-
IdaDataBase.
_get_valid_modelname
(prefix=u'MODEL_')[source]¶ Convenience function : Alternative name for get_valid_tablename.
Parameter prefix has its optional value changed to “MODEL_”.
Examples
>>> idadb._get_valid_modelname() 'MODEL_49537_1434978215' >>> idadb._get_valid_modelname("TEST_") 'TEST_65312_1434996318' >>> idadb._get_valid_modelname("test_") 'TEST_78425_1435632423' >>> idadb._get_valid_tablename("mymodel$") ValueError: Table name is not valid, only alphanumeric characters and underscores are allowed.
_create_table¶
-
IdaDataBase.
_create_table
(dataframe, tablename, primary_key=None)[source]¶ Create a new table in the database by declaring its name and columns based on an existing DataFrame. It is possible declare a column as primary key.
Parameters: dataframe : DataFrame
Pandas DataFrame be used to initiate the table.
tablename : str
Name to be given to the table at its creation.
primary_key: str
Name of a column to declare as primary key.
Notes
The columns and their data type is deducted from the Pandas DataFrame given as parameter.
Examples
>>> from ibmdbpy.sampledata.iris import iris >>> idadb._create_table(iris, "IRIS") 'IRIS' >>> idadb._create_table(iris) 'DATA_FRAME_4956'
_create_view¶
-
IdaDataBase.
_create_view
(idadf, viewname=None)[source]¶ Create a new view in the database from an existing table.
Parameters: idadf : IdaDataFrame
IdaDataFrame to be duplicated as view.
viewname : str, optional
Name to be given to the view at its creation. If not given, a random name will be generated automatically
Returns: str
View name.
Examples
>>> idadf = IdaDataFrame(idadb, "IRIS") >>> idadb._create_view(idadf) 'IDAR_VIEW_4956'
_insert_into_database¶
-
IdaDataBase.
_insert_into_database
(dataframe, tablename, silent=True)[source]¶ Populate an existing table with data from a dataframe.
Parameters: dataframe: DataFrame
Data to be inserted into an existing table, contained in a Pandas DataFrame. It is assumed that the structure matches.
tablename: str
Name of the table in which the data is inserted.
silent : bool, default: True
If True, the INSERT statement is not printed. Avoids flooding the console.
_prepare_and_execute¶
-
IdaDataBase.
_prepare_and_execute
(query, autocommit=True, silent=False)[source]¶ Prepare and execute a query by using the cursor of an idaobject.
Parameters: idaobject: IdaDataBase or IdaDataFrame
query: str
Query to be executed.
autocommit: bool, default: True
If True, the autocommit function is available.
silent: bool, default: False
If True, the SQL statement is not printed.
_check_procedure¶
-
IdaDataBase.
_check_procedure
(proc_name, alg_name=None)[source]¶ Check if a procedure is available in the database.
Parameters: proc_name : str
Name of the procedure to be checked as defined in the underlying database management system.
alg_name : str
Name of the algorithm, human readable.
Returns: bool
Examples
>>> idadb._check_procedure('KMEANS') True >>> idadb._check_procedure('NOT_EXISTING') IdaDatabaseError: Function 'NOT_EXISTING' is not available.
_call_stored_procedure¶
_autocommit¶
-
IdaDataBase.
_autocommit
()[source]¶ Commit changes made to the database in the connection automatically. If the environment variable ‘AUTOCOMMIT’ is set to True, then commit.
Notes
In the case of a commit operation, all changes that are made in the Database after the last commit, including those in the children IdaDataFrames, will be commited.
If the environment variable ‘VERBOSE’ is not set to ‘True’, the autocommit operations will not be notified in the console to the user.