collectd - Send statistics over UDP to collectd servers

This module implements the binary protocol used by the Network plugin to let you send arbitrary numeric data to collectd servers. Other than turning on the Network plugin on the destination collectd server, no configuration is needed.

Although you can configure your collectd server to do anything with your statistics, most people use the RRDtool plugin to efficiently store large quantities of data. Therefore, this tutorial will discuss the output files generated by the RRDtool plugin, even though this module sends destination-agnostic statistics messages to your collectd server.

Here’s an example of sending statistics:

import time, random

import collectd

collectd.start_threads()
conn = collectd.Connection()

while True:
    conn.some_category.record(some_counter = 1, another_stat = random.random())
    if random.randrange(2):
        conn.coin_stats.record("heads", flips = 1)
    else:
        conn.coin_stats.record("tails", flips = 1)
    
    time.sleep(random.randint(1, 4))

If you run this script (which is examples/basic.py in your source package) then your collectd server will create an RRD file for each statistic you are tracking. These files are typically stored under one central location on the filesysem, such as /var/lib/collectd/rrd and organized by the hostname of where the statistics came from and the name of the plugin that generated them.

This module identifies itself with the plugin name any, so if you ran the above script, it would create a localhost/any directory with the following files:

  • gauge-some_category-another_stat.rrd
  • gauge-some_category-some_counter.rrd
  • gauge-coin_stats-heads-flips.rrd
  • gauge-coin_stats-tails-flips.rrd
  • gauge-coin_stats-flips.rrd (contains the sum of heads and tails flips)

Each call to record() increments the statistics you provide. Periodically the sums of these statistics are sent to the collectd server and then reset. Therefore, if you displayed a graph of one of these statistics files, each data point on the graph would represent the sum of all record() values over that time increment.

Installation

This module is free for use under the BSD license. It requires Python 2.6 and will presumably work with Python 2.7, although this hasn’t been tested. It has no other dependencies.

You may click here to download the collectd module. You may also run easy_install collectd if you have EasyInstall on your system. The project page for collectd in the Cheese Shop (aka the Python Package Index or PyPI) may be found here.

You may also check out the development version of collectd with this command:

hg clone https://collectd.googlecode.com/hg/ collectd

Functions and Classes

start_threads()

This function starts two daemon threads. The first takes snapshots of your counters and resets them periodically. The other serializes the statistics into protocol messages understood by the collectd Network plugin and sends them in appropriately-sized UDP packets.

You must call this function when your program starts, or else this module will never actually send any data to any collectd servers. Calling this function more than once throws an exception.

class Connection(hostname = socket.gethostname(), collectd_host = "localhost", collectd_port = 25826, plugin_inst = "")

Connection objects may be instantiated with 4 optional arguments:

  • hostname: the hostname you use to identify yourself to the collectd server; if omitted, this defaults to the result of socket.gethostname()
  • collectd_host: the hostname or ip address of the collectd server to which we will send statistics
  • collectd_port: the port to which you will send statistics messages
  • plugin_inst: the plugin instance name which will be sent to the collectd server; this mostly affects the directory name used by the collectd rrdtool plugin

Connection objects with identical parameters are singletons; in other words, Connection("foo") is Connection("foo") but Connection("foo") is not Connection("bar").

__getattr__(name)

Statistics are recorded through Counter objects, which are dynamically generated by accessing fields on Connection instances. So saying conn.foo creates a Counter object.

These objects are distinct but cached, so conn.foo is conn.foo but conn.foo is not conn.bar.

class Counter(category)

You shouldn’t directly instantiate this class; instead Counter objects are automatically created by accessing attributes of Connection objects such that conn.foo will cache and return Counter("foo").

Both of the following methods swallow and log all possible exceptions, so you never need to worry about an error being thrown by calls to either of these methods. These functions are also synchronized, so you can safely call them simultaneously from different threads.

record(*specific, **stats)

Each keyword argument to this method is interpreted as a statistic, whose name is the argument name, and whose value is incrementd by the argument value.

You may pass one or more string identifiers as positional arguments to this method. If you do so, then separate statistic counts will be maintained for this identifier. These counts are always added to the base count for each statistic.

For example, if you ran the following code

conn = collectd.Connection()
conn.example.record(baz = 1)
conn.example.record("foo", baz = 2)
conn.example.record("bar", baz = 3)

and then statistics were sent to the collectd server, it would result in the following files:

  • gauge-example-baz.rrd (with a value of 6 for this time increment)
  • gauge-example-foo-baz.rrd (with a value of 2 for this time increment)
  • gauge-example-bar-baz.rrd (with a value of 3 for this time increment)
set_exact(**stats)
Each keyword argument to this method is interpreted as a statistic, whose name is the argument name, and whose value is set to the exact value of the argument. Use this method when you have values which you wish to update to a specific value rather than increment.

Warning

All names are sanitized to contain only alphanumeric characters separated by underscores. In other words, if you did something like

conn = collectd.Connection()
counter = getattr(conn, "^@#@foo%$&&*")
stat = {"*()#@spam!@$^&*": 1}
counter.record("()#@bar&*()baz$^_+", **stat)

then the resulting files would be named gauge-foo-spam.rrd and gauge-foo-bar_baz-spam.rrd. Although this behavior is generally desirable, it could lead to your statistics becoming merged. For example, the foo-bar and foo/bar statistics in the following code would be combined into a single foo_bar statistic:

conn = collectd.Connection()
conn.foo.record("foo-bar", baz = 2)
conn.foo.record("foo/bar", baz = 3)

Logging

As mentioned above, collectd swallows exceptions so that you never have to worry about calling a function from this module and triggering an exception. However, any exceptions which do occur are logged using the logging module from the standard library, specifically to the collectd logger. No other logging is performed.

However, this module defines no handlers for the collectd logger, so by default nothing is done with any of these log messages.

A More Complex Example

The following code is available in the examples/primes.py file in your source distribution:

import time
from Queue import Queue
from threading import Thread
from random import normalvariate

import collectd

numbers = Queue()
conn = collectd.Connection()

def is_prime(n):
    for i in xrange(2, n):
        if n % i == 0:
            return False
    return True

def watch_queue():
    while True:
        conn.queue.set_exact(size = numbers.qsize())
        time.sleep(1)

def consumer():
    while True:
        n = numbers.get()
        before = time.time()
        primality = is_prime(n)
        elapsed = time.time() - before
        if primality:
            print n, "is prime"
            conn.consumer.record("prime", count = 1, time = elapsed)
        else:
            print n, "is not prime"
            conn.consumer.record("composite", count = 1, time = elapsed)

def producer():
    while True:
        n = int((time.time() % 30) ** normalvariate(5, 2))
        if n < 2:
            conn.producer.record(too_small = 1)
        elif n > 10 ** 9:
            conn.producer.record(too_big = 1)
        else:
            conn.producer.record(just_right = 1)
            numbers.put(n)
        time.sleep(0.33)

if __name__ == "__main__":
    collectd.start_threads()
    for func in [producer, consumer]:
        t = Thread(target = func)
        t.daemon = True
        t.start()
    
    watch_queue()

Here’s a list of files generated by this code, along with an explanation of the counter which each file contains for each time interval:

  • gauge-queue-size.rrd: a snapshot of the size of the numbers queue
  • gauge-producer-too_small.rrd: a count of the random numbers generated by the producer thread which were discarded for being too small
  • gauge-producer-too_big.rrd: a count of the random numbers generated by the producer thread which were discarded for being too large
  • gauge-producer-just_right.rrd: a count of the random numbers generated by the producer thread which were sent to the consumer thread for factoring
  • gauge-consumer-count.rrd: an overall count of the numbers factored by the consumer thread
  • gauge-consumer-composite-count.rrd: a count of the numbers factored by the consumer thread which turned out to be non-prime
  • gauge-consumer-prime-count.rrd: a count of the numbers factored by the consumer thread which turned out to be prime
  • gauge-consumer-time.rrd: the amount of wallclock time elapsed while factoring numbers
  • gauge-consumer-composite-time.rrd: the amount of wallclock time elapsed while factoring numbers which turned out to be non-prime
  • gauge-consumer-prime-time.rrd: the amount of wallclock time elapsed while factoring numbers which turned out to be prime

TODO: add graphs to this example to demonstrate what the data looks like

A Utility for Graphing RRD Data

This is forthcoming; we’d like a simple script for generating the complex command-line arguments understood by rrdtool graph commands.

At some point we may also want a web interface similar to the existing collectd cgi script which dynamically lets you select and generate graphs, but with awareness of the semantics of the files generated by this module, which would allow users to start with a high-level overview and drill-down into specific data, combine data from different sources into one graph, etc.