# Welcome to powerlaw’s documentation!¶

Contents:

class `powerlaw.``Distribution`(xmin=1, xmax=None, discrete=False, fit_method='Likelihood', data=None, parameters=None, parameter_range=None, initial_parameters=None, discrete_approximation='round', parent_Fit=None, **kwargs)[source]

An abstract class for theoretical probability distributions. Can be created with particular parameter values, or fitted to a dataset. Fitting is by maximum likelihood estimation by default.

Parameters: xmin : int or float, optional The data value beyond which distributions should be fitted. If None an optimal one will be calculated. xmax : int or float, optional The maximum value of the fitted distributions. discrete : boolean, optional Whether the distribution is discrete (integers). data : list or array, optional The data to which to fit the distribution. If provided, the fit will be created at initialization. fit_method : “Likelihood” or “KS”, optional Method for fitting the distribution. “Likelihood” is maximum Likelihood estimation. “KS” is minimial distance estimation using The Kolmogorov-Smirnov test. parameters : tuple or list, optional The parameters of the distribution. Will be overridden if data is given or the fit method is called. parameter_range : dict, optional Dictionary of valid parameter ranges for fitting. Formatted as a dictionary of parameter names (‘alpha’ and/or ‘sigma’) and tuples of their lower and upper limits (ex. (1.5, 2.5), (None, .1) initial_parameters : tuple or list, optional Initial values for the parameter in the fitting search. discrete_approximation : “round”, “xmax” or int, optional If the discrete form of the theoeretical distribution is not known, it can be estimated. One estimation method is “round”, which sums the probability mass from x-.5 to x+.5 for each data point. The other option is to calculate the probability for each x from 1 to N and normalize by their sum. N can be “xmax” or an integer. parent_Fit : Fit object, optional A Fit object from which to use data, if it exists.

Methods

 `KS`([data]) Returns the Kolmogorov-Smirnov distance D between the distribution and the data. `ccdf`([data, survival]) The complementary cumulative distribution function (CCDF) of the theoretical distribution. `cdf`([data, survival]) The cumulative distribution function (CDF) of the theoretical distribution. `fit`([data, suppress_output]) Fits the parameters of the distribution to the data. `generate_random`([n, estimate_discrete]) Generates random numbers from the theoretical probability distribution. `in_range`() Whether the current parameters of the distribution are within the range of valid parameters. `initial_parameters`(data) Return previously user-provided initial parameters or, if never provided, calculate new ones. `likelihoods`(data) The likelihoods of the observed data from the theoretical distribution. `loglikelihoods`(data) The logarithm of the likelihoods of the observed data from the theoretical distribution. `parameter_range`(r[, initial_parameters]) Set the limits on the range of valid parameters to be considered while fitting. `pdf`([data]) Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present. `plot_ccdf`([data, ax, survival]) Plots the complementary cumulative distribution function (CDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. `plot_cdf`([data, ax, survival]) Plots the cumulative distribution function (CDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. `plot_pdf`([data, ax]) Plots the probability density function (PDF) of the theoretical distribution for the values given in data within xmin and xmax, if present.
`KS`(data=None)[source]

Returns the Kolmogorov-Smirnov distance D between the distribution and the data. Also sets the properties D+, D-, V (the Kuiper testing statistic), and Kappa (1 + the average difference between the theoretical and empirical distributions).

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained.
`ccdf`(data=None, survival=True)[source]

The complementary cumulative distribution function (CCDF) of the theoretical distribution. Calculated for the values given in data within xmin and xmax, if present.

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained. survival : bool, optional Whether to calculate a CDF (False) or CCDF (True). True by default. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is less than or equal to X.
`cdf`(data=None, survival=False)[source]

The cumulative distribution function (CDF) of the theoretical distribution. Calculated for the values given in data within xmin and xmax, if present.

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained. survival : bool, optional Whether to calculate a CDF (False) or CCDF (True). False by default. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is less than or equal to X.
`fit`(data=None, suppress_output=False)[source]

Fits the parameters of the distribution to the data. Uses options set at initialization.

`generate_random`(n=1, estimate_discrete=None)[source]

Generates random numbers from the theoretical probability distribution. If xmax is present, it is currently ignored.

Parameters: n : int or float The number of random numbers to generate estimate_discrete : boolean For discrete distributions, whether to use a faster approximation of the random number generator. If None, attempts to inherit the estimate_discrete behavior used for fitting from the Distribution object or the parent Fit object, if present. Approximations only exist for some distributions (namely the power law). If an approximation does not exist an estimate_discrete setting of True will not be inherited. r : array Random numbers drawn from the distribution
`in_range`()[source]

Whether the current parameters of the distribution are within the range of valid parameters.

`initial_parameters`(data)[source]

Return previously user-provided initial parameters or, if never provided, calculate new ones. Default initial parameter estimates are unique to each theoretical distribution.

`likelihoods`(data)[source]

The likelihoods of the observed data from the theoretical distribution. Another name for the probabilities or probability density function.

`loglikelihoods`(data)[source]

The logarithm of the likelihoods of the observed data from the theoretical distribution.

`parameter_range`(r, initial_parameters=None)[source]

Set the limits on the range of valid parameters to be considered while fitting.

Parameters: r : dict A dictionary of the parameter range. Restricted parameter names are keys, and with tuples of the form (lower_bound, upper_bound) as values. initial_parameters : tuple or list, optional Initial parameter values to start the fitting search from.
`pdf`(data=None)[source]

Returns the probability density function (normalized histogram) of the theoretical distribution for the values in data within xmin and xmax, if present.

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained. probabilities : array
`plot_ccdf`(data=None, ax=None, survival=True, **kwargs)[source]

Plots the complementary cumulative distribution function (CDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. Plots to a new figure or to axis ax if provided.

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained. ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. survival : bool, optional Whether to plot a CDF (False) or CCDF (True). True by default. ax : matplotlib axis The axis to which the plot was made.
`plot_cdf`(data=None, ax=None, survival=False, **kwargs)[source]

Plots the cumulative distribution function (CDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. Plots to a new figure or to axis ax if provided.

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained. ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. survival : bool, optional Whether to plot a CDF (False) or CCDF (True). False by default. ax : matplotlib axis The axis to which the plot was made.
`plot_pdf`(data=None, ax=None, **kwargs)[source]

Plots the probability density function (PDF) of the theoretical distribution for the values given in data within xmin and xmax, if present. Plots to a new figure or to axis ax if provided.

Parameters: data : list or array, optional If not provided, attempts to use the data from the Fit object in which the Distribution object is contained. ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. ax : matplotlib axis The axis to which the plot was made.
class `powerlaw.``Fit`(data, discrete=False, xmin=None, xmax=None, verbose=True, fit_method='Likelihood', estimate_discrete=True, discrete_approximation='round', sigma_threshold=None, parameter_range=None, fit_optimizer=None, xmin_distance='D', **kwargs)[source]

A fit of a data set to various probability distributions, namely power laws. For fits to power laws, the methods of Clauset et al. 2007 are used. These methods identify the portion of the tail of the distribution that follows a power law, beyond a value xmin. If no xmin is provided, the optimal one is calculated and assigned at initialization.

Parameters: data : list or array discrete : boolean, optional Whether the data is discrete (integers). xmin : int or float, optional The data value beyond which distributions should be fitted. If None an optimal one will be calculated. xmax : int or float, optional The maximum value of the fitted distributions. verbose: bool, optional Whether to print updates about where we are in the fitting process. Default True. estimate_discrete : bool, optional Whether to estimate the fit of a discrete power law using fast analytical methods, instead of calculating the fit exactly with slow numerical methods. Very accurate with xmin>6 sigma_threshold : float, optional Upper limit on the standard error of the power law fit. Used after fitting, when identifying valid xmin values. parameter_range : dict, optional Dictionary of valid parameter ranges for fitting. Formatted as a dictionary of parameter names (‘alpha’ and/or ‘sigma’) and tuples of their lower and upper limits (ex. (1.5, 2.5), (None, .1)

Methods

 `ccdf`([original_data, survival]) Returns the complementary cumulative distribution function of the data. `cdf`([original_data, survival]) Returns the cumulative distribution function of the data. `distribution_compare`(dist1, dist2[, nested]) Returns the loglikelihood ratio, and its p-value, between the two distribution fits, assuming the candidate distributions are nested. `find_xmin`([xmin_distance]) Returns the optimal xmin beyond which the scaling regime of the power law fits best. `loglikelihood_ratio`(dist1, dist2[, nested]) Another name for distribution_compare. `nested_distribution_compare`(dist1, dist2[, ...]) Returns the loglikelihood ratio, and its p-value, between the two distribution fits, assuming the candidate distributions are nested. `pdf`([original_data]) Returns the probability density function (normalized histogram) of the data. `plot_ccdf`([ax, original_data, survival]) Plots the CCDF to a new figure or to axis ax if provided. `plot_cdf`([ax, original_data, survival]) Plots the CDF to a new figure or to axis ax if provided. `plot_pdf`([ax, original_data, linear_bins]) Plots the probability density function (PDF) or the data to a new figure or to axis ax if provided.
`ccdf`(original_data=False, survival=True, **kwargs)[source]

Returns the complementary cumulative distribution function of the data.

Parameters: original_data : bool, optional Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.) survival : bool, optional Whether to return the complementary cumulative distribution function, also known as the survival function, or the cumulative distribution function, 1-CCDF. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is greater than or equal to X.
`cdf`(original_data=False, survival=False, **kwargs)[source]

Returns the cumulative distribution function of the data.

Parameters: original_data : bool, optional Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.) survival : bool, optional Whether to return the complementary cumulative distribution function, 1-CDF, also known as the survival function. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is less than or equal to X.
`distribution_compare`(dist1, dist2, nested=None, **kwargs)[source]

Returns the loglikelihood ratio, and its p-value, between the two distribution fits, assuming the candidate distributions are nested.

Parameters: dist1 : string Name of the first candidate distribution (ex. ‘power_law’) dist2 : string Name of the second candidate distribution (ex. ‘exponential’) nested : bool or None, optional Whether to assume the candidate distributions are nested versions of each other. None assumes not unless the name of one distribution is a substring of the other. R : float Loglikelihood ratio of the two distributions’ fit to the data. If greater than 0, the first distribution is preferred. If less than 0, the second distribution is preferred. p : float Significance of R
`find_xmin`(xmin_distance=None)[source]

Returns the optimal xmin beyond which the scaling regime of the power law fits best. The attribute self.xmin of the Fit object is also set.

The optimal xmin beyond which the scaling regime of the power law fits best is identified by minimizing the Kolmogorov-Smirnov distance between the data and the theoretical power law fit. This is the method of Clauset et al. 2007.

`loglikelihood_ratio`(dist1, dist2, nested=None, **kwargs)[source]

Another name for distribution_compare.

`nested_distribution_compare`(dist1, dist2, nested=True, **kwargs)[source]

Returns the loglikelihood ratio, and its p-value, between the two distribution fits, assuming the candidate distributions are nested.

Parameters: dist1 : string Name of the first candidate distribution (ex. ‘power_law’) dist2 : string Name of the second candidate distribution (ex. ‘exponential’) nested : bool or None, optional Whether to assume the candidate distributions are nested versions of each other. None assumes not unless the name of one distribution is a substring of the other. True by default. R : float Loglikelihood ratio of the two distributions’ fit to the data. If greater than 0, the first distribution is preferred. If less than 0, the second distribution is preferred. p : float Significance of R
`pdf`(original_data=False, **kwargs)[source]

Returns the probability density function (normalized histogram) of the data.

Parameters: original_data : bool, optional Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.) bin_edges : array The edges of the bins of the probability density function. probabilities : array The portion of the data that is within the bin. Length 1 less than bin_edges, as it corresponds to the spaces between them.
`plot_ccdf`(ax=None, original_data=False, survival=True, **kwargs)[source]

Plots the CCDF to a new figure or to axis ax if provided.

Parameters: ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. original_data : bool, optional Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.) survival : bool, optional Whether to plot a CDF (False) or CCDF (True). True by default. ax : matplotlib axis The axis to which the plot was made.
`plot_cdf`(ax=None, original_data=False, survival=False, **kwargs)[source]

Plots the CDF to a new figure or to axis ax if provided.

Parameters: ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. original_data : bool, optional Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.) survival : bool, optional Whether to plot a CDF (False) or CCDF (True). False by default. ax : matplotlib axis The axis to which the plot was made.
`plot_pdf`(ax=None, original_data=False, linear_bins=False, **kwargs)[source]

Plots the probability density function (PDF) or the data to a new figure or to axis ax if provided.

Parameters: ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. original_data : bool, optional Whether to use all of the data initially passed to the Fit object. If False, uses only the data used for the fit (within xmin and xmax.) linear_bins : bool, optional Whether to use linearly spaced bins (True) or logarithmically spaced bins (False). False by default. ax : matplotlib axis The axis to which the plot was made.
`powerlaw.``bisect_map`(mn, mx, function, target)[source]

Uses binary search to find the target solution to a function, searching in a given ordered sequence of integer values.

Parameters: seq : list or array, monotonically increasing integers function : a function that takes a single integer input, which monotonically decreases over the range of seq. target : the target value of the function value : the input value that yields the target solution. If there is no exact solution in the input sequence, finds the nearest value k such that function(k) <= target < function(k+1). This is similar to the behavior of bisect_left in the bisect package. If even the first, leftmost value of seq does not satisfy this condition, -1 is returned.
`powerlaw.``ccdf`(data, survival=True, **kwargs)[source]

The complementary cumulative distribution function (CCDF) of the data.

Parameters: data : list or array, optional survival : bool, optional Whether to calculate a CDF (False) or CCDF (True). True by default. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is less than or equal to X.
`powerlaw.``cdf`(data, survival=False, **kwargs)[source]

The cumulative distribution function (CDF) of the data.

Parameters: data : list or array, optional survival : bool, optional Whether to calculate a CDF (False) or CCDF (True). False by default. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is less than or equal to X.
`powerlaw.``checkunique`(data)[source]

Quickly checks if a sorted array is all unique elements.

`powerlaw.``cumulative_distribution_function`(data, xmin=None, xmax=None, survival=False, **kwargs)[source]

The cumulative distribution function (CDF) of the data.

Parameters: data : list or array, optional survival : bool, optional Whether to calculate a CDF (False) or CCDF (True). False by default. xmin : int or float, optional The minimum data size to include. Values less than xmin are excluded. xmax : int or float, optional The maximum data size to include. Values greater than xmin are excluded. X : array The sorted, unique values in the data. probabilities : array The portion of the data that is less than or equal to X.
`powerlaw.``is_discrete`(data)[source]

Checks if every element of the array is an integer.

`powerlaw.``loglikelihood_ratio`(loglikelihoods1, loglikelihoods2, nested=False, normalized_ratio=False)[source]

Calculates a loglikelihood ratio and the p-value for testing which of two probability distributions is more likely to have created a set of observations.

Parameters: loglikelihoods1 : list or array The logarithms of the likelihoods of each observation, calculated from a particular probability distribution. loglikelihoods2 : list or array The logarithms of the likelihoods of each observation, calculated from a particular probability distribution. nested : bool, optional Whether one of the two probability distributions that generated the likelihoods is a nested version of the other. False by default. normalized_ratio : bool, optional Whether to return the loglikelihood ratio, R, or the normalized ratio R/sqrt(n*variance) R : float The loglikelihood ratio of the two sets of likelihoods. If positive, the first set of likelihoods is more likely (and so the probability distribution that produced them is a better fit to the data). If negative, the reverse is true. p : float The significance of the sign of R. If below a critical value (typically .05) the sign of R is taken to be significant. If above the critical value the sign of R is taken to be due to statistical fluctuations.
`powerlaw.``nested_loglikelihood_ratio`(loglikelihoods1, loglikelihoods2, **kwargs)[source]

Calculates a loglikelihood ratio and the p-value for testing which of two probability distributions is more likely to have created a set of observations. Assumes one of the probability distributions is a nested version of the other.

Parameters: loglikelihoods1 : list or array The logarithms of the likelihoods of each observation, calculated from a particular probability distribution. loglikelihoods2 : list or array The logarithms of the likelihoods of each observation, calculated from a particular probability distribution. nested : bool, optional Whether one of the two probability distributions that generated the likelihoods is a nested version of the other. True by default. normalized_ratio : bool, optional Whether to return the loglikelihood ratio, R, or the normalized ratio R/sqrt(n*variance) R : float The loglikelihood ratio of the two sets of likelihoods. If positive, the first set of likelihoods is more likely (and so the probability distribution that produced them is a better fit to the data). If negative, the reverse is true. p : float The significance of the sign of R. If below a critical value (typically .05) the sign of R is taken to be significant. If above the critical value the sign of R is taken to be due to statistical fluctuations.
`powerlaw.``pdf`(data, xmin=None, xmax=None, linear_bins=False, **kwargs)[source]

Returns the probability density function (normalized histogram) of the data.

Parameters: data : list or array xmin : float, optional Minimum value of the PDF. If None, uses the smallest value in the data. xmax : float, optional Maximum value of the PDF. If None, uses the largest value in the data. linear_bins : float, optional Whether to use linearly spaced bins, as opposed to logarithmically spaced bins (recommended for log-log plots). bin_edges : array The edges of the bins of the probability density function. probabilities : array The portion of the data that is within the bin. Length 1 less than bin_edges, as it corresponds to the spaces between them.
`powerlaw.``plot_cdf`(data, ax=None, survival=False, **kwargs)[source]

Plots the cumulative distribution function (CDF) of the data to a new figure or to axis ax if provided.

Parameters: data : list or array ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. survival : bool, optional Whether to plot a CDF (False) or CCDF (True). False by default. ax : matplotlib axis The axis to which the plot was made.
`powerlaw.``plot_pdf`(data, ax=None, linear_bins=False, **kwargs)[source]

Plots the probability density function (PDF) to a new figure or to axis ax if provided.

Parameters: data : list or array ax : matplotlib axis, optional The axis to which to plot. If None, a new figure is created. linear_bins : bool, optional Whether to use linearly spaced bins (True) or logarithmically spaced bins (False). False by default. ax : matplotlib axis The axis to which the plot was made.
`powerlaw.``trim_to_range`(data, xmin=None, xmax=None, **kwargs)[source]

Removes elements of the data that are above xmin or below xmax (if present)