API¶
| LDA([n_topics, n_iter, alpha, eta, random_state]) | Latent Dirichlet allocation using Gibbs sampling |
- class horizont.LDA(n_topics=None, n_iter=1000, alpha=0.1, eta=0.01, random_state=None)¶
Latent Dirichlet allocation using Gibbs sampling
Parameters: n_topics : int
Number of topics
n_iter : int, default 1000
Number of sampling iterations
alpha : float, default 0.1
Dirichlet parameter for distribution over topics
eta : float, default 0.01
Dirichlet parameter for distribution over words
random_state : numpy.RandomState | int, optional
The generator used for the initial topics. Default: numpy.random
References
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3 (2003): 993–1022.
Griffiths, Thomas L., and Mark Steyvers. “Finding Scientific Topics.” Proceedings of the National Academy of Sciences 101 (2004): 5228–5235. doi:10.1073/pnas.0307752101.
Wallach, Hanna, David Mimno, and Andrew McCallum. “Rethinking LDA: Why Priors Matter.” In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, 1973–1981, 2009.
Examples
>>> import numpy as np >>> X = np.array([[1,1], [2, 1], [3, 1], [4, 1], [5, 8], [6, 1]]) >>> from horizont import LDA >>> model = LDA(n_topics=2, random_state=0, n_iter=100) >>> model.fit(X) LDA(alpha=... >>> model.components_ array([[ 0.85714286, 0.14285714], [ 0.45 , 0.55 ]]) >>> model.loglikelihood() -40.395...
Attributes
components_ (array, shape = [n_components, n_features]) Matrix of counts recording topic-word assignments Methods
fit(X[, y]) Fit the model with X. fit_transform(X[, y]) Apply dimensionality reduction on X. loglikelihood() score(X, R[, random_state]) Calculate marginal probability of observations in X given Phi. transform(X[, y]) Transform the data X according to the fitted model - fit(X, y=None)¶
Fit the model with X.
Parameters: X: array-like, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features.
Returns: self : object
Returns the instance itself.
- fit_transform(X, y=None)¶
Apply dimensionality reduction on X.
Parameters: X : array-like, shape (n_samples, n_features)
New data, where n_samples in the number of samples and n_features is the number of features.
Returns: X_new : array-like, shape (n_samples, n_components)
- score(X, R, random_state=None)¶
Calculate marginal probability of observations in X given Phi.
Returns a list with estimates for each document separately, mimicking the behavior of scikit-learn. Uses Buntine’s left-to-right sequential sampler.
Parameters: X : array, [n_samples, n_features]
The document-term matrix of documents for evaluation.
R : int
The number of particles to use for the estimation.
Returns: logprobs : array of length n_samples
Estimate of marginal log probability for each row of X.
- transform(X, y=None)¶
Transform the data X according to the fitted model
Parameters: X : array-like, shape (n_samples, n_features)
New data, where n_samples in the number of samples and n_features is the number of features.
Returns: X_new : array-like, shape (n_samples, n_topics)
Raw topic assignment counts