Frame correlation_matrix¶
-
correlation_matrix
(self, data_column_names, matrix_name=None)¶ Calculate correlation matrix for two or more columns.
Parameters: data_column_names : list
The names of the columns from which to compute the matrix.
matrix_name : unicode (default=None)
The name for the returned matrix Frame.
Returns: : Frame
A Frame with the matrix of the correlation values for the columns.
This method applies only to columns containing numerical data.
Examples
Consider Frame my_frame, which contains the data
>>> my_frame.inspect() [#] idnum x1 x2 x3 x4 =============================== [0] 0 1.0 4.0 0.0 -1.0 [1] 1 2.0 3.0 0.0 -1.0 [2] 2 3.0 2.0 1.0 -1.0 [3] 3 4.0 1.0 2.0 -1.0 [4] 4 5.0 0.0 2.0 -1.0
my_frame.correlation_matrix computes the common correlation coefficient (Pearson’s) on each pair of columns in the user-provided list. In this example, the idnum and most of the columns have trivial correlations: -1, 0, or +1. Column x3 provides a contrasting coefficient of 3 / sqrt(3) = 0.948683298051
>>> corr_matrix = my_frame.correlation_matrix(my_frame.column_names) [===Job Progress===]
The resulting table (specifying all columns) is:
>>> corr_matrix.inspect() [#] idnum x1 x2 x3 x4 ========================================================================== [0] 1.0 1.0 -1.0 0.948683298051 0.0 [1] 1.0 1.0 -1.0 0.948683298051 0.0 [2] -1.0 -1.0 1.0 -0.948683298051 0.0 [3] 0.948683298051 0.948683298051 -0.948683298051 1.0 0.0 [4] 0.0 0.0 0.0 0.0 1.0