Table Of Contents

LogisticRegressionModel __init__


__init__(self, name=None)

Create a ‘new’ instance of logistic regression model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of LogisticRegressionModel

Logistic Regression [R31] is a widely used supervised binary and multi-class classification algorithm. The Logistic Regression model is initialized, trained on columns of a frame, predicts the labels of observations, and tests the predicted labels against the true labels. This model runs the MLLib implementation of Logistic Regression [R32], with enhanced features — trained model summary statistics; Covariance and Hessian matrices; ability to specify the frequency of the train and test observations. Testing performance can be viewed via built-in binary and multi-class Classification Metrics. It also allows the user to select the optimizer to be used - L-BFGS [R33] or SGD [R34].

footnotes

[R31]https://en.wikipedia.org/wiki/Logistic_regression
[R32]https://spark.apache.org/docs/1.5.0/mllib-linear-methods.html#logistic-regression
[R33]https://en.wikipedia.org/wiki/Limited-memory_BFGS
[R34]https://en.wikipedia.org/wiki/Stochastic_gradient_descent

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  Sepal_Length  Petal_Length  Class
======================================
[0]           4.9           1.4      0
[1]           4.7           1.3      0
[2]           4.6           1.5      0
[3]           6.3           4.9      1
[4]           6.1           4.7      1
[5]           6.4           4.3      1
[6]           6.6           4.4      1
[7]           7.2           6.0      2
[8]           7.2           5.8      2
[9]           7.4           6.1      2
>>> model = ta.LogisticRegressionModel()
[===Job Progress===]
>>> train_output = model.train(frame, 'Class', ['Sepal_Length', 'Petal_Length'],
...                                 num_classes=3, optimizer='LBFGS', compute_covariance=True)
[===Job Progress===]
>>> train_output.summary_table
                coefficients  degrees_freedom  standard_errors  \
intercept_0        -0.780153                1              NaN
Sepal_Length_1   -120.442165                1  28497036.888425
Sepal_Length_0    -63.683819                1  28504715.870243
intercept_1       -90.484405                1              NaN
Petal_Length_0    117.979824                1  36178481.415888
Petal_Length_1    206.339649                1  36172481.900910
wald_statistic p_value

intercept_0 NaN NaN Sepal_Length_1 -0.000004 1.000000 Sepal_Length_0 -0.000002 1.000000 intercept_1 NaN NaN Petal_Length_0 0.000003 0.998559 Petal_Length_1 0.000006 0.998094

>>> train_output.covariance_matrix.inspect()
[#]  Sepal_Length_0      Petal_Length_0      intercept_0
===============================================================
[0]   8.12518826843e+14   -1050552809704907   5.66008788624e+14
[1]  -1.05055305606e+15   1.30888251756e+15   -3.5175956714e+14
[2]   5.66010683868e+14  -3.51761845892e+14  -2.52746479908e+15
[3]   8.12299962335e+14  -1.05039425964e+15   5.66614798332e+14
[4]  -1.05027789037e+15    1308665462990595    -352436215869081
[5]     566011198950063  -3.51665950639e+14   -2527929411221601