plfit Package¶
cython_plfit Module¶
plfit.py - a python power-law fitter based on code by Aaron Clauset http://www.santafe.edu/~aaronc/powerlaws/ http://arxiv.org/abs/0706.1062 “Power-law distributions in empirical data” Requires pylab (matplotlib), which requires numpy
example use: from plfit import plfit
MyPL = plfit(mydata) MyPL.plotpdf(log=True)
- plfit.cython_plfit.plexp(x, xm=1, a=2.5)[source]¶
CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.
- plfit.cython_plfit.plexp_inv(P, xm, a)[source]¶
Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.
- class plfit.cython_plfit.plfit(x, **kwargs)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha
- alphavsks(autozoom=True, **kwargs)[source]¶
Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.
- plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation
There are 3 implementations of xmin estimation. The fortran version is fastest, the C (cython) version is ~10% slower, and the python version is ~3x slower than the fortran version. Also, the cython code suffers ~2% numerical error relative to the fortran and python for unknown reasons.
- plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, **kwargs)[source]¶
Plots PDF and powerlaw.
- test_pl(niter=1000.0, **kwargs)[source]¶
Monte-Carlo test to determine whether distribution is consistent with a power law
Runs through niter iterations of a sample size identical to the input sample size.
Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.
Once the sample is created, is fit using above methods, then the best fit is used to compute a Kolmogorov-Smirnov statistic. The KS stat distribution is compared to the KS value for the fit to the actual data, and p = fraction of random ks values greater than the data ks value is computed. If p<.1, the data may be inconsistent with a powerlaw. A data set of n(>xmin)>100 is required to distinguish a PL from an exponential, and n(>xmin)>~300 is required to distinguish a log-normal distribution from a PL. For more details, see figure 4.1 and section
WARNING This can take a very long time to run! Execution time scales as niter * setsize
- plfit.cython_plfit.plfit_lsq(x, y)[source]¶
Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html
- plfit.cython_plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x105b7b488>)[source]¶
Tests the power-law fitter
Example (fig 3.4b in Clauset et al.): xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),’x’)
Example 2: xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),’x’)
Example 3: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000) hist(xmarr.squeeze()); # Test results: # mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20 # mean(af) = 2.51 median(af) = 2.49 std(af)=0.14 # biased distribution; far from correct value of xmin but close to correct alpha
Example 4: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv) print(“mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f” % (mean(xmarr),median(xmarr),std(xmarr))) print(“mean(af): %0.2f median(af): %0.2f std(af): %0.2f” % (mean(af),median(af),std(af))) # mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35 # mean(af): 2.51 median(af): 2.50 std(af): 0.07
fortran_plfit Module¶
plfit.py - a python power-law fitter based on code by Aaron Clauset http://www.santafe.edu/~aaronc/powerlaws/ http://arxiv.org/abs/0706.1062 “Power-law distributions in empirical data” Requires pylab (matplotlib), which requires numpy
example use: from plfit import plfit
MyPL = plfit(mydata) MyPL.plotpdf(log=True)
- plfit.fortran_plfit.plexp(x, xm=1, a=2.5)[source]¶
CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.
- plfit.fortran_plfit.plexp_inv(P, xm, a)[source]¶
Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.
- class plfit.fortran_plfit.plfit(x, **kwargs)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha
- alphavsks(autozoom=True, **kwargs)[source]¶
Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.
- plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, to appear (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation
There are 3 implementations of xmin estimation. The fortran version is fastest, the C (cython) version is ~10% slower, and the python version is ~3x slower than the fortran version. Also, the cython code suffers ~2% numerical error relative to the fortran and python for unknown reasons.
- plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, **kwargs)[source]¶
Plots PDF and powerlaw.
- test_pl(niter=1000.0, **kwargs)[source]¶
Monte-Carlo test to determine whether distribution is consistent with a power law
Runs through niter iterations of a sample size identical to the input sample size.
Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.
Once the sample is created, is fit using above methods, then the best fit is used to compute a Kolmogorov-Smirnov statistic. The KS stat distribution is compared to the KS value for the fit to the actual data, and p = fraction of random ks values greater than the data ks value is computed. If p<.1, the data may be inconsistent with a powerlaw. A data set of n(>xmin)>100 is required to distinguish a PL from an exponential, and n(>xmin)>~300 is required to distinguish a log-normal distribution from a PL. For more details, see figure 4.1 and section
WARNING This can take a very long time to run! Execution time scales as niter * setsize
- plfit.fortran_plfit.plfit_lsq(x, y)[source]¶
Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html
- plfit.fortran_plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x10b571500>)[source]¶
Tests the power-law fitter
Example (fig 3.4b in Clauset et al.): xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),’x’)
Example 2: xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),’x’)
Example 3: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000) hist(xmarr.squeeze()); # Test results: # mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20 # mean(af) = 2.51 median(af) = 2.49 std(af)=0.14 # biased distribution; far from correct value of xmin but close to correct alpha
Example 4: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv) print(“mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f” % (mean(xmarr),median(xmarr),std(xmarr))) print(“mean(af): %0.2f median(af): %0.2f std(af): %0.2f” % (mean(af),median(af),std(af))) # mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35 # mean(af): 2.51 median(af): 2.50 std(af): 0.07
plfit Module¶
plfit.py - a python power-law fitter based on code by Aaron Clauset http://www.santafe.edu/~aaronc/powerlaws/ http://arxiv.org/abs/0706.1062 “Power-law distributions in empirical data” Requires pylab (matplotlib), which requires numpy
example use: from plfit import plfit
MyPL = plfit(mydata) MyPL.plotpdf(log=True)
- plfit.plfit.discrete_alpha_mle(data, xmin)¶
Equation B.17 of Clauset et al 2009
The Maximum Likelihood Estimator of the “scaling parameter” alpha in the discrete case is similar to that in the continuous case
- plfit.plfit.discrete_best_alpha(data, alpharangemults=(0.9, 1.1), n_alpha=201, approximate=True, verbose=True)¶
Use the maximum L to determine the most likely value of alpha
- alpharangemults [ 2-tuple ]
- Pair of values indicating multiplicative factors above and below the approximate alpha from the MLE alpha to use when determining the “exact” alpha (by directly maximizing the likelihood function)
- plfit.plfit.discrete_ksD(data, xmin, alpha)¶
given a sorted data set, a minimum, and an alpha, returns the power law ks-test D value w/data
The returned value is the “D” parameter in the ks test
(this is implemented differently from the continuous version because there are potentially multiple identical points that need comparison to the power law)
- plfit.plfit.discrete_likelihood(data, xmin, alpha)¶
Equation B.8 in Clauset
Given a data set, an xmin value, and an alpha “scaling parameter”, computes the log-likelihood (the value to be maximized)
- plfit.plfit.discrete_likelihood_vector(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶
Compute the likelihood for all “scaling parameters” in the range (alpharange) for a given xmin. This is only part of the discrete value likelihood maximization problem as described in Clauset et al (Equation B.8)
- alpharange [ 2-tuple ]
- Two floats specifying the upper and lower limits of the power law alpha to test
- plfit.plfit.discrete_max_likelihood(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶
Returns the argument of the max of the likelihood of the data given an input xmin
- plfit.plfit.discrete_max_likelihood_arg(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶
Returns the argument of the max of the likelihood of the data given an input xmin
- plfit.plfit.most_likely_alpha(data, xmin, alpharange=(1.5, 3.5), n_alpha=201)¶
Return the most likely alpha for the data given an xmin
- plfit.plfit.plexp(x, xm=1, a=2.5)[source]¶
CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.
- plfit.plfit.plexp_inv(P, xm, a)[source]¶
Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.
- class plfit.plfit.plfit(x, **kwargs)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha
- alphavsks(autozoom=True, **kwargs)[source]¶
Plot alpha versus the ks value for derived alpha. This plot can be used as a diagnostic of whether you have derived the ‘best’ fit: if there are multiple local minima, your data set may be well suited to a broken powerlaw or a different function.
- discrete_best_alpha(alpharangemults=(0.9, 1.1), n_alpha=201, approximate=True, verbose=True, finite=True)¶
Use the maximum L to determine the most likely value of alpha
- alpharangemults [ 2-tuple ]
- Pair of values indicating multiplicative factors above and below the approximate alpha from the MLE alpha to use when determining the “exact” alpha (by directly maximizing the likelihood function)
- lognormal(doprint=True)[source]¶
Use the maximum likelihood estimator for a lognormal distribution to produce the best-fit lognormal parameters
- plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None, verbose=False, discrete=None, discrete_approx=True, discrete_n_alpha=1000)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
There are 3 implementations of xmin estimation. The fortran version is fastest, the C (cython) version is ~10% slower, and the python version is ~3x slower than the fortran version. Also, the cython code suffers ~2% numerical error relative to the fortran and python for unknown reasons.
There is also a discrete version implemented in python - it is different from the continous version! discrete [ bool | None ]
If discrete is None, the code will try to determine whether the data set is discrete or continous based on the uniqueness of the data. If discrete is True or False, the distcrete or continuous fitter will be used, respectively.- xmin [ float / int ]
- If you specify xmin, the fitter will only determine alpha assuming the given xmin; the rest of the code (and most of the complexity) is determining an estimate for xmin and alpha.
- nosmall [ bool (True) ]
- When on, the code rejects low s/n points
- finite [ bool (False) ]
- There is a ‘finite-size bias’ to the estimator. The “alpha” the code measures is “alpha-hat” s.t. ᾶ = (nα-1)/(n-1), or α = (1 + ᾶ (n-1)) / n
- quiet [ bool (False) ]
- If False, delivers messages about what fitter is used and the fit results
- verbose [ bool (False) ]
- Deliver descriptive messages about the fit parameters (only if *quiet*==False)
- silent [ bool (False) ]
- If True, will print NO messages
- plotpdf(x=None, xmin=None, alpha=None, nbins=50, dolog=True, dnds=False, drawstyle='steps-post', histcolor='k', plcolor='r', **kwargs)[source]¶
Plots PDF and powerlaw.
kwargs is passed to pylab.hist and pylab.plot
- plotppf(x=None, xmin=None, alpha=None, dolog=True, **kwargs)[source]¶
Plots the power-law-predicted value on the Y-axis against the real values along the X-axis. Can be used as a diagnostic of the fit quality.
- test_pl(niter=1000.0, print_timing=False, **kwargs)[source]¶
Monte-Carlo test to determine whether distribution is consistent with a power law
Runs through niter iterations of a sample size identical to the input sample size.
Will randomly select values from the data < xmin. The number of values selected will be chosen from a uniform random distribution with p(<xmin) = n(<xmin)/n.
Once the sample is created, it is fit using above methods, then the best fit is used to compute a Kolmogorov-Smirnov statistic. The KS stat distribution is compared to the KS value for the fit to the actual data, and p = fraction of random ks values greater than the data ks value is computed. If p<.1, the data may be inconsistent with a powerlaw. A data set of n(>xmin)>100 is required to distinguish a PL from an exponential, and n(>xmin)>~300 is required to distinguish a log-normal distribution from a PL. For more details, see figure 4.1 and section
WARNING This can take a very long time to run! Execution time scales as niter * setsize
- plfit.plfit.plfit_lsq(x, y)[source]¶
Returns A and B in y=Ax^B http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html
- plfit.plfit.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x10a9ce578>)[source]¶
Tests the power-law fitter
Example (fig 3.4b in Clauset et al.):
xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),'x')
Example 2:
xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),'x')
Example 3:
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000) hist(xmarr.squeeze()); # Test results: # mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20 # mean(af) = 2.51 median(af) = 2.49 std(af)=0.14 # biased distribution; far from correct value of xmin but close to correct alphaExample 4:
xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv) print(“mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f” % (mean(xmarr),median(xmarr),std(xmarr))) print(“mean(af): %0.2f median(af): %0.2f std(af): %0.2f” % (mean(af),median(af),std(af))) # mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35 # mean(af): 2.51 median(af): 2.50 std(af): 0.07
plfit_py Module¶
plfit.py - a python power-law fitter based on code by Aaron Clauset http://www.santafe.edu/~aaronc/powerlaws/ http://arxiv.org/abs/0706.1062 “Power-law distributions in empirical data” Requires pylab (matplotlib), which requires numpy
example use: from plfit import plfit
MyPL = plfit(mydata) MyPL.plotpdf(log=True)
- plfit.plfit_py.plexp(x, xm=1, a=2.5)[source]¶
CDF(x) for the piecewise distribution exponential x<xmin, powerlaw x>=xmin This is the CDF version of the distributions drawn in fig 3.4a of Clauset et al.
- plfit.plfit_py.plexp_inv(P, xm, a)[source]¶
Inverse CDF for a piecewise PDF as defined in eqn. 3.10 of Clauset et al.
- class plfit.plfit_py.plfit(x, **kwargs)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
The output “alpha” is defined such that p(x) ~ (x/xmin)^-alpha
- plfit(nosmall=True, finite=False, quiet=False, silent=False, usefortran=False, usecy=False, xmin=None, verbose=False)[source]¶
A Python implementation of the Matlab code http://www.santafe.edu/~aaronc/powerlaws/plfit.m from http://www.santafe.edu/~aaronc/powerlaws/
See A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review, 51, 661-703 (2009). (arXiv:0706.1062) http://arxiv.org/abs/0706.1062
nosmall is on by default; it rejects low s/n points can specify xmin to skip xmin estimation
There are 3 implementations of xmin estimation. The fortran version is fastest, the C (cython) version is ~10% slower, and the python version is ~3x slower than the fortran version. Also, the cython code suffers ~2% numerical error relative to the fortran and python for unknown reasons.
- plfit.plfit_py.test_fitter(xmin=1.0, alpha=2.5, niter=500, npts=1000, invcdf=<function plexp_inv at 0x10a9ced70>, quiet=True, silent=True)[source]¶
Tests the power-law fitter
Example (fig 3.4b in Clauset et al.): xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=1,npts=50000) loglog(xminin,xmarr.squeeze(),’x’)
Example 2: xminin=[0.25,0.5,0.75,1,1.5,2,5,10,50,100] xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=xminin,niter=10,npts=1000) loglog(xminin,xmarr.mean(axis=0),’x’)
Example 3: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000) hist(xmarr.squeeze()); # Test results: # mean(xmarr) = 0.70, median(xmarr)=0.65 std(xmarr)=0.20 # mean(af) = 2.51 median(af) = 2.49 std(af)=0.14 # biased distribution; far from correct value of xmin but close to correct alpha
Example 4: xmarr,af,ksv,nxarr = plfit.test_fitter(xmin=1.0,niter=1000,npts=1000,invcdf=pl_inv) print(“mean(xmarr): %0.2f median(xmarr): %0.2f std(xmarr): %0.2f” % (mean(xmarr),median(xmarr),std(xmarr))) print(“mean(af): %0.2f median(af): %0.2f std(af): %0.2f” % (mean(af),median(af),std(af))) # mean(xmarr): 1.19 median(xmarr): 1.03 std(xmarr): 0.35 # mean(af): 2.51 median(af): 2.50 std(af): 0.07