numkit.timeseries — Time series manipulation and analysis

A time series contains of a sequence of time points (typically spaced equally) and a value for each time point. This module contains some standard scientific time series manipulations.

See also

pandas

Correlations

Autocorrelation time (time when ACF becomes 0 for the first time; uses gromacs.formats.XVG to read the data and numkit.timeseries.autocorrelation_fft() to calculate the ACF):

R = gromacs.formats.XVG("./md.xvg")
acf = autocorrelation_fft(R.array[1])
numpy.where(acf <= 0)[0][0]

Alternatively, fit an exponential to the ACF and extract the time constant (see numkit.timeseries.tcorrel()).

numkit.timeseries.tcorrel(x, y, nstep=100, debug=False)

Calculate the correlation time and an estimate of the error of the mean <y>.

The autocorrelation function f(t) is calculated via FFT on every nstep of the fluctuations of the data around the mean (y-<y>). The normalized ACF f(t)/f(0) is assumed to decay exponentially, f(t)/f(0) = exp(-t/tc) and the decay constant tc is estimated as the integral of the ACF from the start up to its first root.

See [FrenkelSmit2002] p526 for details.

Note

nstep should be set sufficiently large so that there are less than ~50,000 entries in the input.

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
Arguments:
x

1D array of abscissa values (typically time)

y

1D array of the ibservable y(x)

nstep

only analyze every nstep datapoint to speed up calculation [100]

Returns:

dictionary with entries tc (decay constant in units of x), t0 (value of the first root along x (y(t0) = 0)), sigma (error estimate for the mean of y, <y>, corrected for correlations in the data).

numkit.timeseries.autocorrelation_fft(series, remove_mean=True, paddingcorrection=True, normalize=False, **kwargs)

Calculate the auto correlation function.

autocorrelation_fft(series,remove_mean=False,**kwargs) –> acf

The time series is correlated with itself across its whole length. Only the [0,len(series)[ interval is returned.

By default, the mean of the series is subtracted and the correlation of the fluctuations around the mean are investigated.

For the default setting remove_mean=True, acf[0] equals the variance of the series, acf[0] = Var(series) = <(series - <series>)**2>.

Optional:

  • The series can be normalized to its 0-th element so that acf[0] == 1.
  • For calculating the acf, 0-padding is used. The ACF should be corrected for the 0-padding (the values for larger lags are increased) unless mode=’valid’ is set (see below).

Note that the series for mode=’same’|’full’ is inaccurate for long times and should probably be truncated at 1/2*len(series)

Arguments:
series

(time) series, a 1D numpy array of length N

remove_mean

False: use series as is; True: subtract mean(series) from series [True]

paddingcorrection

False: corrected for 0-padding; True: return as is it is. (the latter is appropriate for periodic signals). The correction for element 0=<i<N amounts to a factor N/(N-i). Only applied for modes != “valid” [True]

normalize

True divides by acf[0] so that the first element is 1; False leaves un-normalized [False]

mode

“full” | “same” | “valid”: see scipy.signal.fftconvolve() [“full”]

kwargs

other keyword arguments for scipy.signal.fftconvolve()

Coarse graining time series

The functions in this section are all based on regularized_function(). They reduce the number of datapoints in a time series to maxpoints by histogramming the data into maxpoints bins and then applying a function to reduce the data in each bin. A number of commonly used functions are predefined but it is straightforward to either use apply_histogrammed_function() or regularized_function() directly. For instance, mean_histogrammed_function() is implemented as

def mean_histogrammed_function(t, y, maxpoints):
    return apply_histogrammed_function(numpy.mean, t, y, maxpoints)

More complicated functions can be defined; for instance, one could use numkit.timeseries.tcorrel() to compute the correlation time of the data in short blocks:

def tc_histogrammed_function(t, y, maxpoints):
   dt = numpy.mean(numpy.diff(t))
   def get_tcorrel(y):
      t = numpy.cumsum(dt*numpy.ones_like(y)) - dt
      results = tcorrel(t, y, nstep=1)
      return results['tc']
   return apply_histogrammed_function(get_tcorrel, t, y, bins=maxpoints)

(This particular function (implemented as numkit.timeseries.tc_histogrammed_function()) is not very robust, for instance it has problems when there are only very few data points in each bin because in this case the auto correlation function is not well defined.)

numkit.timeseries.mean_histogrammed_function(t, y, **kwargs)

Compute mean of data y in bins along t.

Returns the mean-regularised function F and the centers of the bins.

See also

regularized_function() with func = numpy.mean()

numkit.timeseries.rms_histogrammed_function(t, y, **kwargs)

Compute root mean square of data y in bins along t.

Returns the RMS-regularised function F and the centers of the bins. demean = True removes the mean first.

regularized_function() with func = sqrt(mean(y*y))

numkit.timeseries.min_histogrammed_function(t, y, **kwargs)

Compute minimum of data y in bins along t.

Returns the min-regularised function F and the centers of the bins.

regularized_function() with func = numpy.min()

numkit.timeseries.max_histogrammed_function(t, y, **kwargs)

Compute maximum of data y in bins along t.

Returns the max-regularised function F and the centers of the bins.

regularized_function() with func = numpy.max()

numkit.timeseries.median_histogrammed_function(t, y, **kwargs)

Compute median of data y in bins along t.

Returns the median-regularised function F and the centers of the bins.

regularized_function() with func = numpy.median()

numkit.timeseries.percentile_histogrammed_function(t, y, **kwargs)

Compute the percentile per of data y in bins along t.

Returns the percentile-regularised function F and the centers of the bins.

Keywords:
per

percentile as a percentage, e.g. 75 is the value that splits the data into the lower 75% and upper 25%; 50 is the median [50.0]

demean

True: remove the mean of the bin data first [False]

regularized_function() with scipy.stats.scoreatpercentile()

numkit.timeseries.error_histogrammed_function(t, y, **kwargs)

Calculate the error in each bin using tcorrel().

Warning

Not well tested and fragile.

numkit.timeseries.circmean_histogrammed_function(t, y, **kwargs)

Compute circular mean of data y in bins along t.

Returns the circmean-regularised function F and the centers of the bins.

kwargs are passed to scipy.stats.morestats.circmean(), in particular set the lower bound with low and the upper one with high. The default is [-pi, +pi].

regularized_function() with func = scipy.stats.morestats.circmean()

Note

Data are interpreted as angles in radians.

numkit.timeseries.circstd_histogrammed_function(t, y, **kwargs)

Compute circular standard deviation of data y in bins along t.

Returns the circstd-regularised function F and the centers of the bins.

kwargs are passed to scipy.stats.morestats.circmean(), in particular set the lower bound with low and the upper one with high. The default is [-pi, +pi].

regularized_function() with func = scipy.stats.morestats.circstd()

Note

Data are interpreted as angles in radians.

numkit.timeseries.tc_histogrammed_function(t, y, **kwargs)

Calculate the correlation time in each bin using tcorrel().

Warning

Not well tested and fragile.

numkit.timeseries.apply_histogrammed_function(func, t, y, **kwargs)

Compute func of data y in bins along t.

Returns the func -regularised function F(t’) and the centers of the bins t’.

func(y) → float

func takes exactly one argument, a numpy 1D array y (the values in a single bin of the histogram), and reduces it to one scalar float.

numkit.timeseries.regularized_function(x, y, func, bins=100, range=None)

Compute func() over data aggregated in bins.

(x,y) --> (x', func(Y')) with Y' = {y: y(x) where x in x' bin}

First the data is collected in bins x’ along x and then func is applied to all data points Y’ that have been collected in the bin.

func(y) → float

func takes exactly one argument, a numpy 1D array y (the values in a single bin of the histogram), and reduces it to one scalar float.

Note

x and y must be 1D arrays.

Arguments:
x

abscissa values (for binning)

y

ordinate values (func is applied)

func

a numpy ufunc that takes one argument, func(Y’)

bins

number or array

range

limits (used with number of bins)

Returns:
F,edges

function and edges (midpoints = 0.5*(edges[:-1]+edges[1:]))

(This function originated as recsql.sqlfunctions.regularized_function().)

Smoothing time series

Function numkit.timeseries.smooth() applies a window kernel to a time series and smoothes fluctuations. The number of points in the time series stays the same.

numkit.timeseries.smooth(x, window_len=11, window='flat')

smooth the data using a window with requested size.

This method is based on the convolution of a scaled window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the begining and end part of the output signal.

Arguments:
x

the input signal, 1D array

window_len

the dimension of the smoothing window, always converted to an integer (using int()) and must be odd

window

the type of window from ‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’, ‘blackman’; flat window will produce a moving average smoothing. If window is a numpy.ndarray then this array is directly used as the window (but it still must contain an odd number of points) [“flat”]

Returns:

the smoothed signal as a 1D array

Example:

Apply a simple moving average to a noisy harmonic signal:

>>> import numpy as np
>>> t = np.linspace(-2, 2, 201)
>>> x = np.sin(t) + np.random.randn(len(t))*0.1
>>> y = smooth(x)

Source: based on http://www.scipy.org/Cookbook/SignalSmooth

numkit.timeseries.smoothing_window_length(resolution, t)

Compute the length of a smooting window of resolution time units.

Arguments:
resolution

length in units of the time in which t us supplied

t

array of time points; if not equidistantly spaced, the mean spacing is used to compute the window length

Returns:

odd integer, the size of a window of approximately resolution

See also

smooth()

Exceptions

exception numkit.timeseries.LowAccuracyWarning

Warns that results may possibly have low accuracy.