LogisticRegressionModel train¶

train(self, frame, label_column, observation_columns, frequency_column=None, num_classes=2, optimizer='LBFGS', compute_covariance=True, intercept=True, feature_scaling=False, threshold=0.5, reg_type='L2', reg_param=0.0, num_iterations=100, convergence_tolerance=0.0001, num_corrections=10, mini_batch_fraction=1.0, step_size=1.0)¶

[ALPHA] Build logistic regression model.

Parameters:

Parameters:	frame : Frame A frame to train the model on. label_column : unicode Column name containing the label for each observation. observation_columns : list Column(s) containing the observations. frequency_column : unicode (default=None) Optional column containing the frequency of observations. num_classes : int32 (default=2) Number of classes optimizer : unicode (default=LBFGS) Set type of optimizer. \| LBFGS - Limited-memory BFGS. \| LBFGS supports multinomial logistic regression. \| SGD - Stochastic Gradient Descent. \| SGD only supports binary logistic regression. compute_covariance : bool (default=True) Compute covariance matrix for the model. intercept : bool (default=True) Add intercept column to training data. feature_scaling : bool (default=False) Perform feature scaling before training model. threshold : float64 (default=0.5) Threshold for separating positive predictions from negative predictions. reg_type : unicode (default=L2) Set type of regularization \| L1 - L1 regularization with sum of absolute values of coefficients \| L2 - L2 regularization with sum of squares of coefficients reg_param : float64 (default=0.0) Regularization parameter num_iterations : int32 (default=100) Maximum number of iterations convergence_tolerance : float64 (default=0.0001) Convergence tolerance of iterations for L-BFGS. Smaller value will lead to higher accuracy with the cost of more iterations. num_corrections : int32 (default=10) Number of corrections used in LBFGS update. Default is 10. Values of less than 3 are not recommended; large values will result in excessive computing time. mini_batch_fraction : float64 (default=1.0) Fraction of data to be used for each SGD iteration step_size : float64 (default=1.0) Initial step size for SGD. In subsequent steps, the step size decreases by stepSize/sqrt(t)
Returns:	: dict An object with a summary of the trained model. The data returned is composed of multiple components: int : numFeatures Number of features in the training data int : numClasses Number of classes in the training data table : summaryTable A summary table composed of: Frame : CovarianceMatrix (optional) Covariance matrix of the trained model. The covariance matrix is the inverse of the Hessian matrix for the trained model. The Hessian matrix is the second-order partial derivatives of the model’s log-likelihood function.

frame : Frame

A frame to train the model on.

label_column : unicode

Column name containing the label for each observation.

observation_columns : list

Column(s) containing the observations.

frequency_column : unicode (default=None)

Optional column containing the frequency of observations.

num_classes : int32 (default=2)

Number of classes

optimizer : unicode (default=LBFGS)

Set type of optimizer. | LBFGS - Limited-memory BFGS. | LBFGS supports multinomial logistic regression. | SGD - Stochastic Gradient Descent. | SGD only supports binary logistic regression.

compute_covariance : bool (default=True)

Compute covariance matrix for the model.

intercept : bool (default=True)

Add intercept column to training data.

feature_scaling : bool (default=False)

Perform feature scaling before training model.

threshold : float64 (default=0.5)

Threshold for separating positive predictions from negative predictions.

reg_type : unicode (default=L2)

Set type of regularization | L1 - L1 regularization with sum of absolute values of coefficients | L2 - L2 regularization with sum of squares of coefficients

reg_param : float64 (default=0.0)

Regularization parameter

num_iterations : int32 (default=100)

Maximum number of iterations

convergence_tolerance : float64 (default=0.0001)

Convergence tolerance of iterations for L-BFGS. Smaller value will lead to higher accuracy with the cost of more iterations.

num_corrections : int32 (default=10)

Number of corrections used in LBFGS update. Default is 10. Values of less than 3 are not recommended; large values will result in excessive computing time.

mini_batch_fraction : float64 (default=1.0)

Fraction of data to be used for each SGD iteration

step_size : float64 (default=1.0)

Initial step size for SGD. In subsequent steps, the step size decreases by stepSize/sqrt(t)

Returns:

: dict

An object with a summary of the trained model. The data returned is composed of multiple components:

int : numFeatures

Number of features in the training data

int : numClasses

Number of classes in the training data

table : summaryTable

A summary table composed of:

Frame : CovarianceMatrix (optional)

Covariance matrix of the trained model.

The covariance matrix is the inverse of the Hessian matrix for the trained model. The Hessian matrix is the second-order partial derivatives of the model’s log-likelihood function.

Creating a Logistic Regression Model using the observation column and label column of the train frame.

Examples

See here for examples.

Quick search

Table Of Contents

LogisticRegressionModel train¶