plfit Package¶

`plfit` Package¶

`cython_plfit` Module¶

plfit.py - a python power-law fitter based on code by Aaron Clauset http://www.santafe.edu/~aaronc/powerlaws/ http://arxiv.org/abs/0706.1062 “Power-law distributions in empirical data” Requires pylab (matplotlib), which requires numpy

example use: from plfit import plfit

MyPL = plfit(mydata) MyPL.plotpdf(log=True)

plfit.cython_plfit.pl_inv(P, xm, a)[source]¶: Inverse CDF for a pure power-law

plfit.cython_plfit.plexp(x, xm=1, a=2.5)[source]¶: CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.

plfit.cython_plfit.plexp_inv(P, xm, a)[source]¶: Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.

class plfit.cython_plfit.plfit(x, **kwargs)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha

alpha_(x)[source]¶

alphavsks(autozoom=True, **kwargs)[source]¶: Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.

kstest_(x)[source]¶

plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation

There are 3 implementations of xmin estimation. The fortran version is fastest, the C (cython) version is ~10% slower, and the python version is ~3x slower than the fortran version. Also, the cython code suffers ~2% numerical error relative to the fortran and python for unknown reasons.

plotcdf(x=None, xmin=None, alpha=None, **kwargs)[source]¶: Plots CDF and powerlaw

plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, **kwargs)[source]¶: Plots PDF and powerlaw.

plotppf(x=None, xmin=None, alpha=None, dolog=True, **kwargs)[source]¶

test_pl(niter=1000.0, **kwargs)[source]¶

Monte-Carlo test to determine whether distribution is consistent with a power law

Runs through niter iterations of a sample size identical to the input sample size.

Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.

Once the sample is created, is fit using above methods, then the best fit is used to compute a Kolmogorov-Smirnov statistic. The KS stat distribution is compared to the KS value for the fit to the actual data, and p = fraction of random ks values greater than the data ks value is computed. If p<.1, the data may be inconsistent with a powerlaw. A data set of n(>xmin)>100 is required to distinguish a PL from an exponential, and n(>xmin)>~300 is required to distinguish a log-normal distribution from a PL. For more details, see figure 4.1 and section

WARNING This can take a very long time to run! Execution time scales as niter * setsize

plfit.cython_plfit.plfit_lsq(x, y)[source]¶: Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html

plfit.cython_plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x105b7b488>)[source]¶

Tests the power-law fitter

Example (fig 3.4b in Clauset et al.): xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),’x’)

Example 2: xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),’x’)

Example 3: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000) hist(xmarr.squeeze()); # Test results: # mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20 # mean(af) = 2.51 median(af) = 2.49 std(af)=0.14 # biased distribution; far from correct value of xmin but close to correct alpha

Example 4: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv) print(“mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f” % (mean(xmarr),median(xmarr),std(xmarr))) print(“mean(af): %0.2f median(af): %0.2f std(af): %0.2f” % (mean(af),median(af),std(af))) # mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35 # mean(af): 2.51 median(af): 2.50 std(af): 0.07

`fortran_plfit` Module¶

example use: from plfit import plfit

MyPL = plfit(mydata) MyPL.plotpdf(log=True)

plfit.fortran_plfit.pl_inv(P, xm, a)[source]¶: Inverse CDF for a pure power-law

plfit.fortran_plfit.plexp(x, xm=1, a=2.5)[source]¶: CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.

plfit.fortran_plfit.plexp_inv(P, xm, a)[source]¶: Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.

class plfit.fortran_plfit.plfit(x, **kwargs)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha

alpha_(x)[source]¶

alphavsks(autozoom=True, **kwargs)[source]¶: Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.

kstest_(x)[source]¶

plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation

plotcdf(x=None, xmin=None, alpha=None, **kwargs)[source]¶: Plots CDF and powerlaw

plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, **kwargs)[source]¶: Plots PDF and powerlaw.

plotppf(x=None, xmin=None, alpha=None, dolog=True, **kwargs)[source]¶

test_pl(niter=1000.0, **kwargs)[source]¶

Monte-Carlo test to determine whether distribution is consistent with a power law

Runs through niter iterations of a sample size identical to the input sample size.

Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.

WARNING This can take a very long time to run! Execution time scales as niter * setsize

plfit.fortran_plfit.plfit_lsq(x, y)[source]¶: Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html

plfit.fortran_plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x10b571500>)[source]¶

Tests the power-law fitter

Example (fig 3.4b in Clauset et al.): xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),’x’)

Example 2: xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),’x’)

`plfit` Module¶

example use: from plfit import plfit

MyPL = plfit(mydata) MyPL.plotpdf(log=True)

plfit.plfit.discrete_alpha_mle(data, xmin)¶

Equation B.17 of Clauset et al 2009

The Maximum Likelihood Estimator of the “scaling parameter” alpha in the discrete case is similar to that in the continuous case

plfit.plfit.discrete_best_alpha(data, alpharangemults=(0.9, 1.1), n_alpha=201, approximate=True, verbose=True)¶

Use the maximum L to determine the most likely value of alpha

alpharangemults [ 2-tuple ]: Pair of values indicating multiplicative factors above and below the approximate alpha from the MLE alpha to use when determining the “exact” alpha (by directly maximizing the likelihood function)

plfit.plfit.discrete_ksD(data, xmin, alpha)¶

given a sorted data set, a minimum, and an alpha, returns the power law ks-test D value w/data

The returned value is the “D” parameter in the ks test

(this is implemented differently from the continuous version because there are potentially multiple identical points that need comparison to the power law)

plfit.plfit.discrete_likelihood(data, xmin, alpha)¶

Equation B.8 in Clauset

Given a data set, an xmin value, and an alpha “scaling parameter”, computes the log-likelihood (the value to be maximized)

plfit.plfit.discrete_likelihood_vector(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶

Compute the likelihood for all “scaling parameters” in the range (alpharange) for a given xmin. This is only part of the discrete value likelihood maximization problem as described in Clauset et al (Equation B.8)

alpharange [ 2-tuple ]: Two floats specifying the upper and lower limits of the power law alpha to test

plfit.plfit.discrete_max_likelihood(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶: Returns the argument of the max of the likelihood of the data given an input xmin

plfit.plfit.discrete_max_likelihood_arg(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶: Returns the argument of the max of the likelihood of the data given an input xmin

plfit.plfit.most_likely_alpha(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶: Return the most likely alpha for the data given an xmin

plfit.plfit.pl_inv(P, xm, a)[source]¶: Inverse CDF for a pure power-law

plfit.plfit.plexp(x, xm=1, a=2.5)[source]¶: CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.

plfit.plfit.plexp_inv(P, xm, a)[source]¶: Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.

class plfit.plfit.plfit(x, **kwargs)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha

alpha_(x)[source]¶

alphavsks(autozoom=True, **kwargs)[source]¶: Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.

discrete_best_alpha(alpharangemults=(0.9, 1.1), n_alpha=201, approximate=True, verbose=True, finite=True)¶

Use the maximum L to determine the most likely value of alpha

alpharangemults [ 2-tuple ]: Pair of values indicating multiplicative factors above and below the approximate alpha from the MLE alpha to use when determining the “exact” alpha (by directly maximizing the likelihood function)

kstest_(x)[source]¶

lognormal(doprint=True)[source]¶: Use the maximum likelihood estimator for a lognormal distribution to produce the best-fit lognormal parameters

plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None, verbose=False, discrete=None, discrete_approx=True, discrete_n_alpha=1000)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

There is also a discrete version implemented in python - it is different from the continous version! discrete [ bool | None ]

If discrete is None, the code will try to determine whether the data set is discrete or continous based on the uniqueness of the data. If discrete is True or False, the distcrete or continuous fitter will be used, respectively.

xmin [ float / int ]: If you specify xmin, the fitter will only determine alpha assuming the given xmin; the rest of the code (and most of the complexity) is determining an estimate for xmin and alpha.
nosmall [ bool (True) ]: When on, the code rejects low s/n points
finite [ bool (False) ]: There is a ‘finite-size bias’ to the estimator. The “alpha” the code measures is “alpha-hat” s.t. ᾶ = (nα-1)/(n-1), or α = (1 + ᾶ (n-1)) / n
quiet [ bool (False) ]: If False, delivers messages about what fitter is used and the fit results
verbose [ bool (False) ]: Deliver descriptive messages about the fit parameters (only if *quiet*==False)
silent [ bool (False) ]: If True, will print NO messages

plot_lognormal_cdf(**kwargs)[source]¶: Plot the fitted lognormal distribution

plot_lognormal_pdf(**kwargs)[source]¶: Plot the fitted lognormal distribution

plotcdf(x=None, xmin=None, alpha=None, **kwargs)[source]¶: Plots CDF and powerlaw

plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, drawstyle='steps-post', histcolor='k', plcolor='r', **kwargs)[source]¶

Plots PDF and powerlaw.

kwargs is passed to pylab.hist and pylab.plot

plotppf(x=None, xmin=None, alpha=None, dolog=True, **kwargs)[source]¶: Plots the power-law-predicted value on the Y-axis against the real values along the X-axis. Can be used as a diagnostic of the fit quality.

test_pl(niter=1000.0, print_timing=False, **kwargs)[source]¶

Monte-Carlo test to determine whether distribution is consistent with a power law

Runs through niter iterations of a sample size identical to the input sample size.

Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.

Once the sample is created, it is fit using above methods, then the best fit is used to compute a Kolmogorov-Smirnov statistic. The KS stat distribution is compared to the KS value for the fit to the actual data, and p = fraction of random ks values greater than the data ks value is computed. If p<.1, the data may be inconsistent with a powerlaw. A data set of n(>xmin)>100 is required to distinguish a PL from an exponential, and n(>xmin)>~300 is required to distinguish a log-normal distribution from a PL. For more details, see figure 4.1 and section

WARNING This can take a very long time to run! Execution time scales as niter * setsize

xminvsks(**kwargs)[source]¶: Plot xmin versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.

plfit.plfit.plfit_lsq(x, y)[source]¶: Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html

plfit.plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x10a9ce578>)[source]¶

Tests the power-law fitter

Example (fig 3.4b in Clauset et al.):

xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100]
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000)
loglog(xminin,xmarr.squeeze(),'x')

Example 2:

xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100]
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000)
loglog(xminin,xmarr.mean(axis=0),'x')

Example 3:

xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000) hist(xmarr.squeeze()); # Test results: # mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20 # mean(af) = 2.51 median(af) = 2.49 std(af)=0.14 # biased distribution; far from correct value of xmin but close to correct alpha

Example 4:

xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv) print(“mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f” % (mean(xmarr),median(xmarr),std(xmarr))) print(“mean(af): %0.2f median(af): %0.2f std(af): %0.2f” % (mean(af),median(af),std(af))) # mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35 # mean(af): 2.51 median(af): 2.50 std(af): 0.07

`plfit_py` Module¶

example use: from plfit import plfit

MyPL = plfit(mydata) MyPL.plotpdf(log=True)

plfit.plfit_py.pl_inv(P, xm, a)[source]¶: Inverse CDF for a pure power-law

plfit.plfit_py.plexp(x, xm=1, a=2.5)[source]¶: CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.

plfit.plfit_py.plexp_inv(P, xm, a)[source]¶: Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.

class plfit.plfit_py.plfit(x, **kwargs)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha

alpha_(x)[source]¶

kstest_(x)[source]¶

plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None, verbose=False)[source]¶

A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/

See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062

nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation

plfit.plfit_py.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x10a9ced70>, quiet=True, silent=True)[source]¶

Tests the power-law fitter

Example (fig 3.4b in Clauset et al.): xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),’x’)

Example 2: xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),’x’)

agpy 0.1 documentation

plfit Package¶

`plfit` Package¶

`cython_plfit` Module¶

`fortran_plfit` Module¶

`plfit` Module¶

`plfit_py` Module¶

`setup` Module¶

Table Of Contents

Search

agpy 0.1 documentation

plfit Package¶

plfit Package¶

cython_plfit Module¶

fortran_plfit Module¶

plfit Module¶

plfit_py Module¶

setup Module¶

Table Of Contents

Search

`plfit` Package¶

`cython_plfit` Module¶

`fortran_plfit` Module¶

`plfit` Module¶

`plfit_py` Module¶

`setup` Module¶