auromat.util.histogram module

This module contains a fast replacement for numpy’s histogramdd and histogram2d. Two changes were made. The first was replacing

np.digitize(a, b)

with

np.searchsorted(b, a, “right”)

This performance bug is explained on https://github.com/numpy/numpy/issues/2656 The speedup is around 2x for big number of bins (roughly >100). It assumes that the bins are monotonic.

The other change is to allow lists of weight arrays. This is advantageous for resampling as there is just one set of coordinates but several data arrays (=weights). Therefore repeated computations are prevented.

auromat.util.histogram.histogram2d(x, y, bins=10, range=None, normed=False, weights=None)[source]

Compute the bi-dimensional histogram of two data samples.

x : array_like, shape (N,)
An array containing the x coordinates of the points to be histogrammed.
y : array_like, shape (N,)
An array containing the y coordinates of the points to be histogrammed.
bins : int or [int, int] or array_like or [array, array], optional

The bin specification:

  • If int, the number of bins for the two dimensions (nx=ny=bins).
  • If [int, int], the number of bins in each dimension (nx, ny = bins).
  • If array_like, the bin edges for the two dimensions (x_edges=y_edges=bins).
  • If [array, array], the bin edges in each dimension (x_edges, y_edges = bins).
range : array_like, shape(2,2), optional
The leftmost and rightmost edges of the bins along each dimension (if not specified explicitly in the bins parameters): [[xmin, xmax], [ymin, ymax]]. All values outside of this range will be considered outliers and not tallied in the histogram.
normed : bool, optional
If False, returns the number of samples in each bin. If True, returns the bin density bin_count / sample_count / bin_area.
weights : array_like, shape(N,), optional
An array of values w_i weighing each sample (x_i, y_i). Weights are normalized to 1 if normed is True. If normed is False, the values of the returned histogram are equal to the sum of the weights belonging to the samples falling into each bin. Weights can also be a list of (weight arrays or None), in which case a list of histograms is returned as H.
H : ndarray, shape(nx, ny)
The bi-dimensional histogram of samples x and y. Values in x are histogrammed along the first dimension and values in y are histogrammed along the second dimension.
xedges : ndarray, shape(nx,)
The bin edges along the first dimension.
yedges : ndarray, shape(ny,)
The bin edges along the second dimension.

histogram : 1D histogram histogramdd : Multidimensional histogram

When normed is True, then the returned histogram is the sample density, defined such that the sum over bins of the product bin_value * bin_area is 1.

Please note that the histogram does not follow the Cartesian convention where x values are on the abscissa and y values on the ordinate axis. Rather, x is histogrammed along the first dimension of the array (vertical), and y along the second dimension of the array (horizontal). This ensures compatibility with histogramdd.

>>> import matplotlib as mpl
>>> import matplotlib.pyplot as plt

Construct a 2D-histogram with variable bin width. First define the bin edges:

>>> xedges = [0, 1, 1.5, 3, 5]
>>> yedges = [0, 2, 3, 4, 6]

Next we create a histogram H with random bin content:

>>> x = np.random.normal(3, 1, 100)
>>> y = np.random.normal(1, 1, 100)
>>> H, xedges, yedges = np.histogram2d(y, x, bins=(xedges, yedges))

Or we fill the histogram H with a determined bin content:

>>> H = np.ones((4, 4)).cumsum().reshape(4, 4)
>>> print H[::-1]  # This shows the bin content in the order as plotted
[[ 13.  14.  15.  16.]
 [  9.  10.  11.  12.]
 [  5.   6.   7.   8.]
 [  1.   2.   3.   4.]]

Imshow can only do an equidistant representation of bins:

>>> fig = plt.figure(figsize=(7, 3))
>>> ax = fig.add_subplot(131)
>>> ax.set_title('imshow: equidistant')
>>> im = plt.imshow(H, interpolation='nearest', origin='low',
                extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])

pcolormesh can display exact bin edges:

>>> ax = fig.add_subplot(132)
>>> ax.set_title('pcolormesh: exact bin edges')
>>> X, Y = np.meshgrid(xedges, yedges)
>>> ax.pcolormesh(X, Y, H)
>>> ax.set_aspect('equal')

NonUniformImage displays exact bin edges with interpolation:

>>> ax = fig.add_subplot(133)
>>> ax.set_title('NonUniformImage: interpolated')
>>> im = mpl.image.NonUniformImage(ax, interpolation='bilinear')
>>> xcenters = xedges[:-1] + 0.5 * (xedges[1:] - xedges[:-1])
>>> ycenters = yedges[:-1] + 0.5 * (yedges[1:] - yedges[:-1])
>>> im.set_data(xcenters, ycenters, H)
>>> ax.images.append(im)
>>> ax.set_xlim(xedges[0], xedges[-1])
>>> ax.set_ylim(yedges[0], yedges[-1])
>>> ax.set_aspect('equal')
>>> plt.show()
auromat.util.histogram.histogramdd(sample, bins=10, range=None, normed=False, weights=None)[source]

Compute the multidimensional histogram of some data.

sample : array_like
The data to be histogrammed. It must be an (N,D) array or data that can be converted to such. The rows of the resulting array are the coordinates of points in a D dimensional polytope.
bins : sequence or int, optional

The bin specification:

  • A sequence of arrays describing the bin edges along each dimension.
  • The number of bins for each dimension (nx, ny, ... =bins)
  • The number of bins for all dimensions (nx=ny=...=bins).
range : sequence, optional
A sequence of lower and upper bin edges to be used if the edges are not given explicitly in bins. Defaults to the minimum and maximum values along each dimension.
normed : bool, optional
If False, returns the number of samples in each bin. If True, returns the bin density bin_count / sample_count / bin_volume.
weights : array_like (N,), optional
An array of values w_i weighing each sample (x_i, y_i, z_i, ...). Weights are normalized to 1 if normed is True. If normed is False, the values of the returned histogram are equal to the sum of the weights belonging to the samples falling into each bin. Weights can also be a list of (weight arrays or None), in which case a list of histograms is returned as H.
H : ndarray
The multidimensional histogram of sample x. See normed and weights for the different possible semantics.
edges : list
A list of D arrays describing the bin edges for each dimension.

histogram: 1-D histogram histogram2d: 2-D histogram

>>> r = np.random.randn(100,3)
>>> H, edges = np.histogramdd(r, bins = (5, 8, 4))
>>> H.shape, edges[0].size, edges[1].size, edges[2].size
((5, 8, 4), 6, 9, 5)