Table Of Contents

LdaModel __init__


__init__(self, name=None)

Creates Latent Dirichlet Allocation model

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of LdaModel

LDA is a commonly-used algorithm for topic modeling, but, more broadly, is considered a dimensionality reduction technique. For more detail see LDA.

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  doc_id     word_id     word_count
======================================
[0]  nytimes    harry                3
[1]  nytimes    economy             35
[2]  nytimes    jobs                40
[3]  nytimes    magic                1
[4]  nytimes    realestate          15
[5]  nytimes    movies               6
[6]  economist  economy             50
[7]  economist  jobs                35
[8]  economist  realestate          20
[9]  economist  movies               1
>>> model = ta.LdaModel()
[===Job Progress===]
>>> train_output = model.train(frame, 'doc_id', 'word_id', 'word_count', max_iterations = 3, num_topics = 2)
[===Job Progress===]
>>> train_output
{'topics_given_word': Frame  <unnamed>
row_count = 8
schema = [word_id:unicode, topic_probabilities:vector(2)]
status = ACTIVE  (last_read_date = 2015-10-23T11:07:46.556000-07:00), 'topics_given_doc': Frame  <unnamed>
row_count = 3
schema = [doc_id:unicode, topic_probabilities:vector(2)]
status = ACTIVE  (last_read_date = 2015-10-23T11:07:46.369000-07:00), 'report': u'======Graph Statistics======\nNumber of vertices: 11} (doc: 3, word: 8})\nNumber of edges: 16\n\n======LDA Configuration======\nnumTopics: 2\nalpha: 1.100000023841858\nbeta: 1.100000023841858\nmaxIterations: 3\n', 'word_given_topics': Frame  <unnamed>
row_count = 8
schema = [word_id:unicode, topic_probabilities:vector(2)]
status = ACTIVE  (last_read_date = 2015-10-23T11:07:46.465000-07:00)}
>>> topics_given_doc = train_output['topics_given_doc']
[===Job Progress===]
>>> topics_given_doc.inspect()
[#]  doc_id       topic_probabilities
===========================================================
[0]  harrypotter  [0.06417509902256538, 0.9358249009774346]
[1]  economist    [0.8065841283073141, 0.19341587169268581]
[2]  nytimes      [0.855316939742769, 0.14468306025723088]
>>> topics_given_doc.column_names
[u'doc_id', u'topic_probabilities']
>>> word_given_topics = train_output['word_given_topics']
[===Job Progress===]
>>> word_given_topics.inspect()
[#]  word_id     topic_probabilities
=============================================================
[0]  harry       [0.005015572372943657, 0.2916109787103347]
[1]  realestate  [0.167941871746252, 0.032187084858186256]
[2]  secrets     [0.026543839878055035, 0.17103864163730945]
[3]  movies      [0.03704750433384287, 0.003294403360133419]
[4]  magic       [0.016497495727347045, 0.19676900962555072]
[5]  economy     [0.3805836266747442, 0.10952481503975171]
[6]  chamber     [0.0035944004256137523, 0.13168123398523954]
[7]  jobs        [0.36277568884120137, 0.06389383278349432]
>>> word_given_topics.column_names
[u'word_id', u'topic_probabilities']
>>> topics_given_word = train_output['topics_given_word']
[===Job Progress===]
>>> topics_given_word.inspect()
[#]  word_id     topic_probabilities
===========================================================
[0]  harry       [0.018375903962878668, 0.9816240960371213]
[1]  realestate  [0.8663322126823493, 0.13366778731765067]
[2]  secrets     [0.15694172611285945, 0.8430582738871405]
[3]  movies      [0.9444179131148587, 0.055582086885141324]
[4]  magic       [0.09026309091077593, 0.9097369090892241]
[5]  economy     [0.8098866029287505, 0.19011339707124958]
[6]  chamber     [0.0275551649439219, 0.9724448350560781]
[7]  jobs        [0.8748608515169193, 0.12513914848308066]
>>> topics_given_word.column_names
[u'word_id', u'topic_probabilities']
>>> prediction = model.predict(['harry', 'secrets', 'magic', 'harry', 'chamber' 'test'])
[===Job Progress===]
>>> prediction
{u'topics_given_doc': [0.3149285399451628, 0.48507146005483726], u'new_words_percentage': 20.0, u'new_words_count': 1}
>>> prediction['topics_given_doc']
[0.3149285399451628, 0.48507146005483726]
>>> prediction['new_words_percentage']
20.0
>>> prediction['new_words_count']
1
>>> prediction.has_key('topics_given_doc')
True
>>> prediction.has_key('new_words_percentage')
True
>>> prediction.has_key('new_words_count')
True
>>> model.publish()
[===Job Progress===]