LogisticRegressionModel __init__¶
-
__init__
(self, name=None)¶ Create a ‘new’ instance of logistic regression model.
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of LogisticRegressionModel
Logistic Regression [R31] is a widely used supervised binary and multi-class classification algorithm. The Logistic Regression model is initialized, trained on columns of a frame, predicts the labels of observations, and tests the predicted labels against the true labels. This model runs the MLLib implementation of Logistic Regression [R32], with enhanced features — trained model summary statistics; Covariance and Hessian matrices; ability to specify the frequency of the train and test observations. Testing performance can be viewed via built-in binary and multi-class Classification Metrics. It also allows the user to select the optimizer to be used - L-BFGS [R33] or SGD [R34].
footnotes
[R31] https://en.wikipedia.org/wiki/Logistic_regression [R32] https://spark.apache.org/docs/1.5.0/mllib-linear-methods.html#logistic-regression [R33] https://en.wikipedia.org/wiki/Limited-memory_BFGS [R34] https://en.wikipedia.org/wiki/Stochastic_gradient_descent Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns.
>>> frame.inspect() [#] Sepal_Length Petal_Length Class ====================================== [0] 4.9 1.4 0 [1] 4.7 1.3 0 [2] 4.6 1.5 0 [3] 6.3 4.9 1 [4] 6.1 4.7 1 [5] 6.4 4.3 1 [6] 6.6 4.4 1 [7] 7.2 6.0 2 [8] 7.2 5.8 2 [9] 7.4 6.1 2
>>> model = ta.LogisticRegressionModel() [===Job Progress===] >>> train_output = model.train(frame, 'Class', ['Sepal_Length', 'Petal_Length'], ... num_classes=3, optimizer='LBFGS', compute_covariance=True) [===Job Progress===] >>> train_output.summary_table coefficients degrees_freedom standard_errors \ intercept_0 -0.780153 1 NaN Sepal_Length_1 -120.442165 1 28497036.888425 Sepal_Length_0 -63.683819 1 28504715.870243 intercept_1 -90.484405 1 NaN Petal_Length_0 117.979824 1 36178481.415888 Petal_Length_1 206.339649 1 36172481.900910
wald_statistic p_valueintercept_0 NaN NaN Sepal_Length_1 -0.000004 1.000000 Sepal_Length_0 -0.000002 1.000000 intercept_1 NaN NaN Petal_Length_0 0.000003 0.998559 Petal_Length_1 0.000006 0.998094
>>> train_output.covariance_matrix.inspect() [#] Sepal_Length_0 Petal_Length_0 intercept_0 =============================================================== [0] 8.12518826843e+14 -1050552809704907 5.66008788624e+14 [1] -1.05055305606e+15 1.30888251756e+15 -3.5175956714e+14 [2] 5.66010683868e+14 -3.51761845892e+14 -2.52746479908e+15 [3] 8.12299962335e+14 -1.05039425964e+15 5.66614798332e+14 [4] -1.05027789037e+15 1308665462990595 -352436215869081 [5] 566011198950063 -3.51665950639e+14 -2527929411221601