.. _cluster_management: ****************** Cluster Management ****************** py-postgresql provides cluster management tools in order to give the user fine-grained control over a PostgreSQL cluster and access to information about an installation of PostgreSQL. .. _installation: Installations ============= `postgresql.installation.Installation` objects are primarily used to access PostgreSQL installation information. Normally, they are created using a dictionary constructed from the output of the pg_config_ executable:: from postgresql.installation import Installation, pg_config_dictionary pg_install = Installation(pg_config_dictionary('/usr/local/pgsql/bin/pg_config')) The extraction of pg_config_ information is isolated from Installation instantiation in order to allow Installations to be created from arbitrary dictionaries. This can be useful in cases where the installation layout is inconsistent with the standard PostgreSQL installation layout, or if a faux Installation needs to be created for testing purposes. Installation Interface Points ----------------------------- ``Installation(info)`` Instantiate an Installation using the given information. Normally, this information is extracted from a pg_config_ executable using `postgresql.installation.pg_config_dictionary`:: info = pg_config_dictionary('/usr/local/pgsql/bin/pg_config') pg_install = Installation(info) ``Installation.version`` The installation's version string:: pg_install.version 'PostgreSQL 9.0devel' ``Installation.version_info`` A tuple containing the version's ``(major, minor, patch, state, level)``. Where ``major``, ``minor``, ``patch``, and ``level`` are `int` objects, and ``state`` is a `str` object:: pg_install.version_info (9, 0, 0, 'devel', 0) ``Installation.ssl`` A `bool` indicating whether or not the installation has SSL support. ``Installation.configure_options`` The options given to the ``configure`` script that built the installation. The options are represented using a dictionary object whose keys are normalized long option names, and whose values are the option's argument. If the option takes no argument, `True` will be used as the value. The normalization of the long option names consists of removing the preceding dashes, lowering the string, and replacing any dashes with underscores. For instance, ``--enable-debug`` will be ``enable_debug``:: pg_install.configure_options {'enable_debug': True, 'with_libxml': True, 'enable_cassert': True, 'with_libedit_preferred': True, 'prefix': '/src/build/pg90', 'with_openssl': True, 'enable_integer_datetimes': True, 'enable_depend': True} ``Installation.paths`` The paths of the installation as a dictionary where the keys are the path identifiers and the values are the absolute file system paths. For instance, ``'bindir'`` is associated with ``$PREFIX/bin``, ``'libdir'`` is associated with ``$PREFIX/lib``, etc. The paths included in this dictionary are listed on the class' attributes: `Installation.pg_directories` and `Installation.pg_executables`. The keys that point to installation directories are: ``bindir``, ``docdir``, ``includedir``, ``pkgincludedir``, ``includedir_server``, ``libdir``, ``pkglibdir``, ``localedir``, ``mandir``, ``sharedir``, and ``sysconfdir``. The keys that point to installation executables are: ``pg_config``, ``psql``, ``initdb``, ``pg_resetxlog``, ``pg_controldata``, ``clusterdb``, ``pg_ctl``, ``pg_dump``, ``pg_dumpall``, ``postgres``, ``postmaster``, ``reindexdb``, ``vacuumdb``, ``ipcclean``, ``createdb``, ``ecpg``, ``createuser``, ``createlang``, ``droplang``, ``dropuser``, and ``pg_restore``. .. note:: If the executable does not exist, the value will be `None` instead of an absoluate path. To get the path to the psql_ executable:: from postgresql.installation import Installation pg_install = Installation('/usr/local/pgsql/bin/pg_config') psql_path = pg_install.paths['psql'] Clusters ======== `postgresql.cluster.Cluster` is the class used to manage a PostgreSQL cluster--a data directory created by initdb_. A Cluster represents a data directory with respect to a given installation of PostgreSQL, so creating a `postgresql.cluster.Cluster` object requires a `postgresql.installation.Installation`, and a file system path to the data directory. In part, a `postgresql.cluster.Cluster` is the Python programmer's variant of the pg_ctl_ command. However, it goes beyond the basic process control functionality and extends into initialization and configuration as well. A Cluster manages the server process using the `subprocess` module and signals. The `subprocess.Popen` object, ``Cluster.daemon_process``, is retained when the Cluster starts the server process itself. This gives the Cluster access to the result code of server process when it exits, and the ability to redirect stderr and stdout to a parameterized file object using subprocess features. Despite its use of `subprocess`, Clusters can control a server process that was *not* started by the Cluster's ``start`` method. Initializing Clusters --------------------- `postgresql.cluster.Cluster` provides a method for initializing a `Cluster`'s data directory, ``init``. This method provides a Python interface to the PostgreSQL initdb_ command. ``init`` is a regular method and accepts a few keyword parameters. Normally, parameters are directly mapped to initdb_ command options. However, ``password`` makes use of initdb's capability to read the superuser's password from a file. To do this, a temporary file is allocated internally by the method:: from postgresql.installation import Installation, pg_config_dictionary from postgresql.cluster import Cluster pg_install = Installation(pg_config_dictionary('/usr/local/pgsql/bin/pg_config')) pg_cluster = Cluster(pg_install, 'pg_data') pg_cluster.init(user = 'pg', password = 'secret', encoding = 'utf-8') The init method will block until the initdb command is complete. Once initialized, the Cluster may be configured. Configuring Clusters -------------------- A Cluster's `configuration file`_ can be manipulated using the `Cluster.settings` mapping. The mapping's methods will always access the configuration file, so it may be desirable to cache repeat reads. Also, if multiple settings are being applied, using the ``update()`` method may be important to avoid writing the entire file multiple times:: pg_cluster.settings.update({'listen_addresses' : 'localhost', 'port' : '6543'}) Similarly, to avoid opening and reading the entire file multiple times, `Cluster.settings.getset` should be used to retrieve multiple settings:: d = pg_cluster.settings.getset(set(('listen_addresses', 'port'))) d {'listen_addresses' : 'localhost', 'port' : '6543'} Values contained in ``settings`` are always Python strings:: assert pg_cluster.settings['max_connections'].__class__ is str The ``postgresql.conf`` file is only one part of the server configuration. Structured access and manipulation of the pg_hba_ file is not supported. Clusters only provide the file path to the pg_hba_ file:: hba = open(pg_cluster.hba_file) If the configuration of the Cluster is altered while the server process is running, it may be necessary to signal the process that configuration changes have been made. This signal can be sent using the ``Cluster.reload()`` method. ``Cluster.reload()`` will send a SIGHUP signal to the server process. However, not all changes to configuration settings can go into effect after calling ``Cluster.reload()``. In those cases, the server process will need to be shutdown and started again. Controlling Clusters -------------------- The server process of a Cluster object can be controlled with the ``start()``, ``stop()``, ``shutdown()``, ``kill()``, and ``restart()`` methods. These methods start the server process, signal the server process, or, in the case of restart, a combination of the two. When a Cluster starts the server process, it's ran as a subprocess. Therefore, if the current process exits, the server process will exit as well. ``start()`` does *not* automatically daemonize the server process. .. note:: Under Microsoft Windows, above does not hold true. The server process will continue running despite the exit of the parent process. To terminate a server process, one of these three methods should be called: ``stop``, ``shutdown``, or ``kill``. ``stop`` is a graceful shutdown and will *wait for all clients to disconnect* before shutting down. ``shutdown`` will close any open connections and safely shutdown the server process. ``kill`` will immediately terminate the server process leading to recovery upon starting the server process again. .. note:: Using ``kill`` may cause shared memory to be leaked. Normally, `Cluster.shutdown` is the appropriate way to terminate a server process. Cluster Interface Points ------------------------ Methods and properties available on `postgresql.cluster.Cluster` instances: ``Cluster(installation, data_directory)`` Create a `postgresql.cluster.Cluster` object for the specified `postgresql.installation.Installation`, and ``data_directory``. The ``data_directory`` must be an absoluate file system path. The directory does *not* need to exist. The ``init()`` method may later be used to create the cluster. ``Cluster.installation`` The Cluster's `postgresql.installation.Installation` instance. ``Cluster.data_directory`` The absolute path to the PostgreSQL data directory. This directory may not exist. ``Cluster.init([encoding = None[, user = None[, password = None]]])`` Run the `initdb`_ executable of the configured installation to initialize the cluster at the configured data directory, `Cluster.data_directory`. ``encoding`` is mapped to ``-E``, the default database encoding. By default, the encoding is determined from the environment's locale. ``user`` is mapped to ``-U``, the database superuser name. By default, the current user's name. ``password`` is ultimately mapped to ``--pwfile``. The argument given to the long option is actually a path to the temporary file that holds the given password. Raises `postgresql.cluster.InitDBError` when initdb_ returns a non-zero result code. Raises `postgresql.cluster.ClusterInitializationError` when there is no initdb_ in the Installation. ``Cluster.initialized()`` Whether or not the data directory exists, *and* if it looks like a PostgreSQL data directory. Meaning, the directory must contain a ``postgresql.conf`` file and a ``base`` directory. ``Cluster.drop()`` Shutdown the Cluster's server process and completely remove the `Cluster.data_directory` from the file system. ``Cluster.pid()`` The server's process identifier as a Python `int`. `None` if there is no server process running. This is a method rather than a property as it may read the PID from a file in cases where the server process was not started by the Cluster. ``Cluster.start([logfile = None[, settings = None]])`` Start the PostgreSQL server process for the Cluster if it is not already running. This will execute postgres_ as a subprocess. If ``logfile``, an opened and writable file object, is given, stderr and stdout will be redirected to that file. By default, both stderr and stdout are closed. If ``settings`` is given, the mapping or sequence of pairs will be used as long options to the subprocess. For each item, ``--{key}={value}`` will be given as an argument to the subprocess. ``Cluster.running()`` Whether or not the cluster's server process is running. Returns `True` or `False`. Even if `True` is returned, it does *not* mean that the server process is ready to accept connections. ``Cluster.ready_for_connections()`` Whether or not the Cluster is ready to accept connections. Usually called after `Cluster.start`. Returns `True` when the Cluster can accept connections, `False` when it cannot, and `None` if the Cluster's server process is not running at all. ``Cluster.wait_until_started([timeout = 10[, delay = 0.05]])`` Blocks the process until the cluster is identified as being ready for connections. Usually called after ``Cluster.start()``. Raises `postgresql.cluster.ClusterNotRunningError` if the server process is not running at all. Raises `postgresql.cluster.ClusterTimeoutError` if `Cluster.ready_for_connections()` does not return `True` within the given `timeout` period. Raises `postgresql.cluster.ClusterStartupError` if the server process terminates while polling for readiness. ``timeout`` and ``delay`` are both in seconds. Where ``timeout`` is the maximum time to wait for the Cluster to be ready for connections, and ``delay`` is the time to sleep between calls to `Cluster.ready_for_connections()`. ``Cluster.stop()`` Signal the cluster to shutdown when possible. The *server* will wait for all clients to disconnect before shutting down. ``Cluster.shutdown()`` Signal the cluster to shutdown immediately. Any open client connections will be closed. ``Cluster.kill()`` Signal the absolute destruction of the server process(SIGKILL). *This will require recovery when the cluster is started again.* *Shared memory may be leaked.* ``Cluster.wait_until_stopped([timeout = 10[, delay = 0.05]])`` Blocks the process until the cluster is identified as being shutdown. Usually called after `Cluster.stop` or `Cluster.shutdown`. Raises `postgresql.cluster.ClusterTimeoutError` if `Cluster.ready_for_connections` does not return `None` within the given `timeout` period. ``Cluster.reload()`` Signal the server that it should reload its configuration files(SIGHUP). Usually called after manipulating `Cluster.settings` or modifying the contents of `Cluster.hba_file`. ``Cluster.restart([logfile = None[, settings = None[, timeout = 10]]])`` Stop the server process, wait until it is stopped, start the server process, and wait until it has started. .. note:: This calls ``Cluster.stop()``, so it will wait until clients disconnect before starting up again. The ``logfile`` and ``settings`` parameters will be given to `Cluster.start`. ``timeout`` will be given to `Cluster.wait_until_stopped` and `Cluster.wait_until_started`. ``Cluster.settings`` A `collections.Mapping` interface to the ``postgresql.conf`` file of the cluster. A notable extension to the mapping interface is the ``getset`` method. This method will return a dictionary object containing the settings whose names were contained in the `set` object given to the method. This method should be used when multiple settings need to be retrieved from the configuration file. ``Cluster.hba_file`` The path to the cluster's pg_hba_ file. This property respects the HBA file location setting in ``postgresql.conf``. Usually, ``$PGDATA/pg_hba.conf``. ``Cluster.daemon_path`` The path to the executable to use to start the server process. ``Cluster.daemon_process`` The `subprocess.Popen` instance of the server process. `None` if the server process was not started or was not started using the Cluster object. .. _pg_hba: http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html .. _pg_config: http://www.postgresql.org/docs/current/static/app-pgconfig.html .. _initdb: http://www.postgresql.org/docs/current/static/app-initdb.html .. _psql: http://www.postgresql.org/docs/current/static/app-psql.html .. _postgres: http://www.postgresql.org/docs/current/static/app-postgres.html .. _pg_ctl: http://www.postgresql.org/docs/current/static/app-pg-ctl.html .. _configuration file: http://www.postgresql.org/docs/current/static/runtime-config.html