Table Of Contents

LogisticRegressionModel train


train(self, frame, label_column, observation_columns, frequency_column=None, num_classes=2, optimizer='LBFGS', compute_covariance=True, intercept=True, feature_scaling=False, threshold=0.5, reg_type='L2', reg_param=0.0, num_iterations=100, convergence_tolerance=0.0001, num_corrections=10, mini_batch_fraction=1.0, step_size=1.0)

[ALPHA] Build logistic regression model.

Parameters:

frame : Frame

A frame to train the model on.

label_column : unicode

Column name containing the label for each observation.

observation_columns : list

Column(s) containing the observations.

frequency_column : unicode (default=None)

Optional column containing the frequency of observations.

num_classes : int32 (default=2)

Number of classes

optimizer : unicode (default=LBFGS)

Set type of optimizer. | LBFGS - Limited-memory BFGS. | LBFGS supports multinomial logistic regression. | SGD - Stochastic Gradient Descent. | SGD only supports binary logistic regression.

compute_covariance : bool (default=True)

Compute covariance matrix for the model.

intercept : bool (default=True)

Add intercept column to training data.

feature_scaling : bool (default=False)

Perform feature scaling before training model.

threshold : float64 (default=0.5)

Threshold for separating positive predictions from negative predictions.

reg_type : unicode (default=L2)

Set type of regularization | L1 - L1 regularization with sum of absolute values of coefficients | L2 - L2 regularization with sum of squares of coefficients

reg_param : float64 (default=0.0)

Regularization parameter

num_iterations : int32 (default=100)

Maximum number of iterations

convergence_tolerance : float64 (default=0.0001)

Convergence tolerance of iterations for L-BFGS. Smaller value will lead to higher accuracy with the cost of more iterations.

num_corrections : int32 (default=10)

Number of corrections used in LBFGS update. Default is 10. Values of less than 3 are not recommended; large values will result in excessive computing time.

mini_batch_fraction : float64 (default=1.0)

Fraction of data to be used for each SGD iteration

step_size : float64 (default=1.0)

Initial step size for SGD. In subsequent steps, the step size decreases by stepSize/sqrt(t)

Returns:

: dict

An object with a summary of the trained model. The data returned is composed of multiple components:

int : numFeatures
Number of features in the training data
int : numClasses
Number of classes in the training data
table : summaryTable
A summary table composed of:
Frame : CovarianceMatrix (optional)
Covariance matrix of the trained model.

The covariance matrix is the inverse of the Hessian matrix for the trained model. The Hessian matrix is the second-order partial derivatives of the model’s log-likelihood function.

Creating a Logistic Regression Model using the observation column and label column of the train frame.

Examples

See here for examples.