API¶

LDA([n_topics, n_iter, alpha, eta, random_state]) Latent Dirichlet allocation using Gibbs sampling

class horizont.LDA(n_topics=None, n_iter=1000, alpha=0.1, eta=0.01, random_state=None)¶

Latent Dirichlet allocation using Gibbs sampling

Parameters:

n_topics : int

Number of topics

n_iter : int, default 1000

Number of sampling iterations

alpha : float, default 0.1

Dirichlet parameter for distribution over topics

eta : float, default 0.01

Dirichlet parameter for distribution over words

random_state : numpy.RandomState | int, optional

The generator used for the initial topics. Default: numpy.random

References

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3 (2003): 993–1022.

Griffiths, Thomas L., and Mark Steyvers. “Finding Scientific Topics.” Proceedings of the National Academy of Sciences 101 (2004): 5228–5235. doi:10.1073/pnas.0307752101.

Wallach, Hanna, David Mimno, and Andrew McCallum. “Rethinking LDA: Why Priors Matter.” In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, 1973–1981, 2009.

Examples

>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1], [4, 1], [5, 8], [6, 1]])
>>> from horizont import LDA
>>> model = LDA(n_topics=2, random_state=0, n_iter=100)
>>> model.fit(X) 
LDA(alpha=...
>>> model.components_
array([[ 0.85714286,  0.14285714],
       [ 0.45      ,  0.55      ]])
>>> model.loglikelihood() 
-40.395...

Attributes

components_

(array, shape = [n_components, n_features]) Matrix of counts recording topic-word assignments

Methods

`fit`(X[, y])	Fit the model with X.
`fit_transform`(X[, y])	Apply dimensionality reduction on X.
`loglikelihood`()
`score`(X, R[, random_state])	Calculate marginal probability of observations in X given Phi.
`transform`(X[, y])	Transform the data X according to the fitted model

fit(X, y=None)¶

Fit the model with X.

Parameters:

X: array-like, shape (n_samples, n_features)

Training data, where n_samples in the number of samples and n_features is the number of features.

Returns:

self : object

Returns the instance itself.

fit_transform(X, y=None)¶

Apply dimensionality reduction on X.

Parameters:

X : array-like, shape (n_samples, n_features)

New data, where n_samples in the number of samples and n_features is the number of features.

Returns:

X_new : array-like, shape (n_samples, n_components)

score(X, R, random_state=None)¶

Calculate marginal probability of observations in X given Phi.

Returns a list with estimates for each document separately, mimicking the behavior of scikit-learn. Uses Buntine’s left-to-right sequential sampler.

Parameters:

X : array, [n_samples, n_features]

The document-term matrix of documents for evaluation.

R : int

The number of particles to use for the estimation.

Returns:

logprobs : array of length n_samples

Estimate of marginal log probability for each row of X.

transform(X, y=None)¶

Transform the data X according to the fitted model

Parameters:

X : array-like, shape (n_samples, n_features)

New data, where n_samples in the number of samples and n_features is the number of features.

Returns:

X_new : array-like, shape (n_samples, n_topics)

Raw topic assignment counts