[ALPHA]
Creates Latent Dirichlet Allocation model
POST /v1/commands/
GET /v1/commands/:id
Request
Route
Body
name: | model:lda/train
|
arguments: | model : Model
frame : Frame
document_column_name : unicode
Column Name for documents.
Column should contain a str value.
word_column_name : unicode
Column name for words.
Column should contain a str value.
word_count_column_name : unicode
Column name for word count.
Column should contain an int32 or int64 value.
max_iterations : int32 (default=20)
The maximum number of iterations that the algorithm will execute.
The valid value range is all positive int.
Default is 20.
alpha : list (default=None)
The hyperparameter for document-specific distribution over topics.
Mainly used as a smoothing parameter in Bayesian inference.
If set to a singleton list List(-1d), then docConcentration is set automatically.
If set to singleton list List(t) where t != -1, then t is replicated to a vector of length k during LDAOptimizer.initialize().
Otherwise, the alpha must be length k.
Currently the EM optimizer only supports symmetric distributions, so all values in the vector should be the same.
Values should be greater than 1.0. Default value is -1.0 indicating automatic setting.
beta : float32 (default=1.10000002384)
The hyperparameter for word-specific distribution over topics.
Mainly used as a smoothing parameter in Bayesian inference.
Larger value implies that topics contain all words more uniformly and
smaller value implies that topics are more concentrated on a small
subset of words.
Valid value range is all positive float greater than or equal to 1.
Default is 0.1.
num_topics : int32 (default=10)
The number of topics to identify in the LDA model.
Using fewer topics will speed up the computation, but the extracted topics
might be more abstract or less specific; using more topics will
result in more computation but lead to more specific topics.
Valid value range is all positive int.
Default is 10.
random_seed : int64 (default=None)
An optional random seed.
The random seed is used to initialize the pseudorandom number generator
used in the LDA model. Setting the random seed to the same value every
time the model is trained, allows LDA to generate the same topic distribution
if the corpus and LDA parameters are unchanged.
|
Headers
Authorization: test_api_key_1
Content-type: application/json
Description
See the discussion about Latent Dirichlet Allocation at Wikipedia.
Response
Status
Body
Returns information about the command. See the Response Body for Get Command here below. It is the same.
GET /v1/commands/:id
Request
Route
Body
(None)
Headers
Authorization: test_api_key_1
Content-type: application/json
Response
Status
Body
dict
The data returned is composed of multiple components:
Frame : topics_given_doc
Conditional probabilities of topic given document.
Frame : word_given_topics
Conditional probabilities of word given topic.
Frame : topics_given_word
Conditional probabilities of topic given word.
str : report
The configuration and learning curve report for Latent Dirichlet
Allocation as a multiple line str.