Cluster Management

py-postgresql provides cluster management tools in order to give the user fine-grained control over a PostgreSQL cluster and access to information about an installation of PostgreSQL.

Installations

postgresql.installation.Installation objects are primarily used to access PostgreSQL installation information. Normally, they are created using a dictionary constructed from the output of the pg_config executable:

from postgresql.installation import Installation, pg_config_dictionary
pg_install = Installation(pg_config_dictionary('/usr/local/pgsql/bin/pg_config'))

The extraction of pg_config information is isolated from Installation instantiation in order to allow Installations to be created from arbitrary dictionaries. This can be useful in cases where the installation layout is inconsistent with the standard PostgreSQL installation layout, or if a faux Installation needs to be created for testing purposes.

Installation Interface Points

Installation(info)

Instantiate an Installation using the given information. Normally, this information is extracted from a pg_config executable using postgresql.installation.pg_config_dictionary:

info = pg_config_dictionary('/usr/local/pgsql/bin/pg_config')
pg_install = Installation(info)
Installation.version

The installation’s version string:

pg_install.version
'PostgreSQL 9.0devel'
Installation.version_info

A tuple containing the version’s (major, minor, patch, state, level). Where major, minor, patch, and level are int objects, and state is a str object:

pg_install.version_info
(9, 0, 0, 'devel', 0)
Installation.ssl
A bool indicating whether or not the installation has SSL support.
Installation.configure_options

The options given to the configure script that built the installation. The options are represented using a dictionary object whose keys are normalized long option names, and whose values are the option’s argument. If the option takes no argument, True will be used as the value.

The normalization of the long option names consists of removing the preceding dashes, lowering the string, and replacing any dashes with underscores. For instance, --enable-debug will be enable_debug:

pg_install.configure_options
{'enable_debug': True, 'with_libxml': True,
 'enable_cassert': True, 'with_libedit_preferred': True,
 'prefix': '/src/build/pg90', 'with_openssl': True,
 'enable_integer_datetimes': True, 'enable_depend': True}
Installation.paths

The paths of the installation as a dictionary where the keys are the path identifiers and the values are the absolute file system paths. For instance, 'bindir' is associated with $PREFIX/bin, 'libdir' is associated with $PREFIX/lib, etc. The paths included in this dictionary are listed on the class’ attributes: Installation.pg_directories and Installation.pg_executables.

The keys that point to installation directories are: bindir, docdir, includedir, pkgincludedir, includedir_server, libdir, pkglibdir, localedir, mandir, sharedir, and sysconfdir.

The keys that point to installation executables are: pg_config, psql, initdb, pg_resetxlog, pg_controldata, clusterdb, pg_ctl, pg_dump, pg_dumpall, postgres, postmaster, reindexdb, vacuumdb, ipcclean, createdb, ecpg, createuser, createlang, droplang, dropuser, and pg_restore.

Note

If the executable does not exist, the value will be None instead of an absoluate path.

To get the path to the psql executable:

from postgresql.installation import Installation
pg_install = Installation('/usr/local/pgsql/bin/pg_config')
psql_path = pg_install.paths['psql']

Clusters

postgresql.cluster.Cluster is the class used to manage a PostgreSQL cluster–a data directory created by initdb. A Cluster represents a data directory with respect to a given installation of PostgreSQL, so creating a postgresql.cluster.Cluster object requires a postgresql.installation.Installation, and a file system path to the data directory.

In part, a postgresql.cluster.Cluster is the Python programmer’s variant of the pg_ctl command. However, it goes beyond the basic process control functionality and extends into initialization and configuration as well.

A Cluster manages the server process using the subprocess module and signals. The subprocess.Popen object, Cluster.daemon_process, is retained when the Cluster starts the server process itself. This gives the Cluster access to the result code of server process when it exits, and the ability to redirect stderr and stdout to a parameterized file object using subprocess features.

Despite its use of subprocess, Clusters can control a server process that was not started by the Cluster’s start method.

Initializing Clusters

postgresql.cluster.Cluster provides a method for initializing a Cluster‘s data directory, init. This method provides a Python interface to the PostgreSQL initdb command.

init is a regular method and accepts a few keyword parameters. Normally, parameters are directly mapped to initdb command options. However, password makes use of initdb’s capability to read the superuser’s password from a file. To do this, a temporary file is allocated internally by the method:

from postgresql.installation import Installation, pg_config_dictionary
from postgresql.cluster import Cluster
pg_install = Installation(pg_config_dictionary('/usr/local/pgsql/bin/pg_config'))
pg_cluster = Cluster(pg_install, 'pg_data')
pg_cluster.init(user = 'pg', password = 'secret', encoding = 'utf-8')

The init method will block until the initdb command is complete. Once initialized, the Cluster may be configured.

Configuring Clusters

A Cluster’s configuration file can be manipulated using the Cluster.settings mapping. The mapping’s methods will always access the configuration file, so it may be desirable to cache repeat reads. Also, if multiple settings are being applied, using the update() method may be important to avoid writing the entire file multiple times:

pg_cluster.settings.update({'listen_addresses' : 'localhost', 'port' : '6543'})

Similarly, to avoid opening and reading the entire file multiple times, Cluster.settings.getset should be used to retrieve multiple settings:

d = pg_cluster.settings.getset(set(('listen_addresses', 'port')))
d
{'listen_addresses' : 'localhost', 'port' : '6543'}

Values contained in settings are always Python strings:

assert pg_cluster.settings['max_connections'].__class__ is str

The postgresql.conf file is only one part of the server configuration. Structured access and manipulation of the pg_hba file is not supported. Clusters only provide the file path to the pg_hba file:

hba = open(pg_cluster.hba_file)

If the configuration of the Cluster is altered while the server process is running, it may be necessary to signal the process that configuration changes have been made. This signal can be sent using the Cluster.reload() method. Cluster.reload() will send a SIGHUP signal to the server process. However, not all changes to configuration settings can go into effect after calling Cluster.reload(). In those cases, the server process will need to be shutdown and started again.

Controlling Clusters

The server process of a Cluster object can be controlled with the start(), stop(), shutdown(), kill(), and restart() methods. These methods start the server process, signal the server process, or, in the case of restart, a combination of the two.

When a Cluster starts the server process, it’s ran as a subprocess. Therefore, if the current process exits, the server process will exit as well. start() does not automatically daemonize the server process.

Note

Under Microsoft Windows, above does not hold true. The server process will continue running despite the exit of the parent process.

To terminate a server process, one of these three methods should be called: stop, shutdown, or kill. stop is a graceful shutdown and will wait for all clients to disconnect before shutting down. shutdown will close any open connections and safely shutdown the server process. kill will immediately terminate the server process leading to recovery upon starting the server process again.

Note

Using kill may cause shared memory to be leaked.

Normally, Cluster.shutdown is the appropriate way to terminate a server process.

Cluster Interface Points

Methods and properties available on postgresql.cluster.Cluster instances:

Cluster(installation, data_directory)

Create a postgresql.cluster.Cluster object for the specified postgresql.installation.Installation, and data_directory.

The data_directory must be an absoluate file system path. The directory does not need to exist. The init() method may later be used to create the cluster.

Cluster.installation
The Cluster’s postgresql.installation.Installation instance.
Cluster.data_directory
The absolute path to the PostgreSQL data directory. This directory may not exist.
Cluster.init([encoding = None[, user = None[, password = None]]])

Run the initdb executable of the configured installation to initialize the cluster at the configured data directory, Cluster.data_directory.

encoding is mapped to -E, the default database encoding. By default, the encoding is determined from the environment’s locale.

user is mapped to -U, the database superuser name. By default, the current user’s name.

password is ultimately mapped to --pwfile. The argument given to the long option is actually a path to the temporary file that holds the given password.

Raises postgresql.cluster.InitDBError when initdb returns a non-zero result code.

Raises postgresql.cluster.ClusterInitializationError when there is no initdb in the Installation.

Cluster.initialized()
Whether or not the data directory exists, and if it looks like a PostgreSQL data directory. Meaning, the directory must contain a postgresql.conf file and a base directory.
Cluster.drop()
Shutdown the Cluster’s server process and completely remove the Cluster.data_directory from the file system.
Cluster.pid()
The server’s process identifier as a Python int. None if there is no server process running. This is a method rather than a property as it may read the PID from a file in cases where the server process was not started by the Cluster.
Cluster.start([logfile = None[, settings = None]])

Start the PostgreSQL server process for the Cluster if it is not already running. This will execute postgres as a subprocess.

If logfile, an opened and writable file object, is given, stderr and stdout will be redirected to that file. By default, both stderr and stdout are closed.

If settings is given, the mapping or sequence of pairs will be used as long options to the subprocess. For each item, --{key}={value} will be given as an argument to the subprocess.

Cluster.running()
Whether or not the cluster’s server process is running. Returns True or False. Even if True is returned, it does not mean that the server process is ready to accept connections.
Cluster.ready_for_connections()

Whether or not the Cluster is ready to accept connections. Usually called after Cluster.start.

Returns True when the Cluster can accept connections, False when it cannot, and None if the Cluster’s server process is not running at all.

Cluster.wait_until_started([timeout = 10[, delay = 0.05]])

Blocks the process until the cluster is identified as being ready for connections. Usually called after Cluster.start().

Raises postgresql.cluster.ClusterNotRunningError if the server process is not running at all.

Raises postgresql.cluster.ClusterTimeoutError if Cluster.ready_for_connections() does not return True within the given timeout period.

Raises postgresql.cluster.ClusterStartupError if the server process terminates while polling for readiness.

timeout and delay are both in seconds. Where timeout is the maximum time to wait for the Cluster to be ready for connections, and delay is the time to sleep between calls to Cluster.ready_for_connections().

Cluster.stop()
Signal the cluster to shutdown when possible. The server will wait for all clients to disconnect before shutting down.
Cluster.shutdown()
Signal the cluster to shutdown immediately. Any open client connections will be closed.
Cluster.kill()
Signal the absolute destruction of the server process(SIGKILL). This will require recovery when the cluster is started again. Shared memory may be leaked.
Cluster.wait_until_stopped([timeout = 10[, delay = 0.05]])

Blocks the process until the cluster is identified as being shutdown. Usually called after Cluster.stop or Cluster.shutdown.

Raises postgresql.cluster.ClusterTimeoutError if Cluster.ready_for_connections does not return None within the given timeout period.

Cluster.reload()
Signal the server that it should reload its configuration files(SIGHUP). Usually called after manipulating Cluster.settings or modifying the contents of Cluster.hba_file.
Cluster.restart([logfile = None[, settings = None[, timeout = 10]]])

Stop the server process, wait until it is stopped, start the server process, and wait until it has started.

Note

This calls Cluster.stop(), so it will wait until clients disconnect before starting up again.

The logfile and settings parameters will be given to Cluster.start. timeout will be given to Cluster.wait_until_stopped and Cluster.wait_until_started.

Cluster.settings

A collections.Mapping interface to the postgresql.conf file of the cluster.

A notable extension to the mapping interface is the getset method. This method will return a dictionary object containing the settings whose names were contained in the set object given to the method. This method should be used when multiple settings need to be retrieved from the configuration file.

Cluster.hba_file
The path to the cluster’s pg_hba file. This property respects the HBA file location setting in postgresql.conf. Usually, $PGDATA/pg_hba.conf.
Cluster.daemon_path
The path to the executable to use to start the server process.
Cluster.daemon_process
The subprocess.Popen instance of the server process. None if the server process was not started or was not started using the Cluster object.