LogisticRegressionModel train¶
-
train
(self, frame, label_column, observation_columns, frequency_column=None, num_classes=2, optimizer='LBFGS', compute_covariance=True, intercept=True, feature_scaling=False, threshold=0.5, reg_type='L2', reg_param=0.0, num_iterations=100, convergence_tolerance=0.0001, num_corrections=10, mini_batch_fraction=1.0, step_size=1.0)¶ [ALPHA] Build logistic regression model.
Parameters: frame : Frame
A frame to train the model on.
label_column : unicode
Column name containing the label for each observation.
observation_columns : list
Column(s) containing the observations.
frequency_column : unicode (default=None)
Optional column containing the frequency of observations.
num_classes : int32 (default=2)
Number of classes
optimizer : unicode (default=LBFGS)
Set type of optimizer. | LBFGS - Limited-memory BFGS. | LBFGS supports multinomial logistic regression. | SGD - Stochastic Gradient Descent. | SGD only supports binary logistic regression.
compute_covariance : bool (default=True)
Compute covariance matrix for the model.
intercept : bool (default=True)
Add intercept column to training data.
feature_scaling : bool (default=False)
Perform feature scaling before training model.
threshold : float64 (default=0.5)
Threshold for separating positive predictions from negative predictions.
reg_type : unicode (default=L2)
Set type of regularization | L1 - L1 regularization with sum of absolute values of coefficients | L2 - L2 regularization with sum of squares of coefficients
reg_param : float64 (default=0.0)
Regularization parameter
num_iterations : int32 (default=100)
Maximum number of iterations
convergence_tolerance : float64 (default=0.0001)
Convergence tolerance of iterations for L-BFGS. Smaller value will lead to higher accuracy with the cost of more iterations.
num_corrections : int32 (default=10)
Number of corrections used in LBFGS update. Default is 10. Values of less than 3 are not recommended; large values will result in excessive computing time.
mini_batch_fraction : float64 (default=1.0)
Fraction of data to be used for each SGD iteration
step_size : float64 (default=1.0)
Initial step size for SGD. In subsequent steps, the step size decreases by stepSize/sqrt(t)
Returns: : dict
An object with a summary of the trained model. The data returned is composed of multiple components:
int : numFeaturesNumber of features in the training dataint : numClassesNumber of classes in the training datatable : summaryTableA summary table composed of:Frame : CovarianceMatrix (optional)Covariance matrix of the trained model.The covariance matrix is the inverse of the Hessian matrix for the trained model. The Hessian matrix is the second-order partial derivatives of the model’s log-likelihood function.
Creating a Logistic Regression Model using the observation column and label column of the train frame.
Examples
See here for examples.