Table Of Contents

Commands frame/column_summary_statistics

Calculate multiple statistics for a column.

POST /v1/commands/

GET /v1/commands/:id

Request

Route

POST /v1/commands/

Body

name:

frame/column_summary_statistics

arguments:

frame : Frame

<Missing Description>

data_column : unicode

The column to be statistically summarized. Must contain numerical data; all NaNs and infinite values are excluded from the calculation.

weights_column : unicode (default=None)

Name of column holding weights of column values.

use_population_variance : bool (default=None)

If true, the variance is calculated as the population variance. If false, the variance calculated as the sample variance. Because this option affects the variance, it affects the standard deviation and the confidence intervals as well. Default is false.


Headers

Authorization: test_api_key_1
Content-type: application/json

Description

Notes

Sample Variance

Sample Variance is computed by the following formula:

\left( \frac{1}{W - 1} \right) * sum_{i} \
\left(x_{i} - M \right) ^{2}

where W is sum of weights over valid elements of positive weight, and M is the weighted mean.

Population Variance

Population Variance is computed by the following formula:

\left( \frac{1}{W} \right) * sum_{i} \
\left(x_{i} - M \right) ^{2}

where W is sum of weights over valid elements of positive weight, and M is the weighted mean.

Standard Deviation
The square root of the variance.
Logging Invalid Data

A row is bad when it contains a NaN or infinite value in either its data or weights column. In this case, it contributes to bad_row_count; otherwise it contributes to good row count.

A good row can be skipped because the value in its weight column is less than or equal to 0. In this case, it contributes to non_positive_weight_count, otherwise (when the weight is greater than 0) it contributes to valid_data_weight_pair_count.

Equations

bad_row_count + good_row_count = # rows in the frame
positive_weight_count + non_positive_weight_count = good_row_count

In particular, when no weights column is provided and all weights are 1.0:

non_positive_weight_count = 0 and
positive_weight_count = good_row_count

Response

Status

200 OK

Body

Returns information about the command. See the Response Body for Get Command here below. It is the same.

GET /v1/commands/:id

Request

Route

GET /v1/commands/18

Body

(None)

Headers

Authorization: test_api_key_1
Content-type: application/json

Response

Status

200 OK

Body

dict

Dictionary containing summary statistics. The data returned is composed of multiple components:

mean : [ double | None ]
Arithmetic mean of the data.
geometric_mean : [ double | None ]
Geometric mean of the data. None when there is a data element <= 0, 1.0 when there are no data elements.
variance : [ double | None ]
None when there are <= 1 many data elements. Sample variance is the weighted sum of the squared distance of each data element from the weighted mean, divided by the total weight minus 1. None when the sum of the weights is <= 1. Population variance is the weighted sum of the squared distance of each data element from the weighted mean, divided by the total weight.
standard_deviation : [ double | None ]
The square root of the variance. None when sample variance is being used and the sum of weights is <= 1.
total_weight : long
The count of all data elements that are finite numbers. In other words, after excluding NaNs and infinite values.
minimum : [ double | None ]
Minimum value in the data. None when there are no data elements.
maximum : [ double | None ]
Maximum value in the data. None when there are no data elements.
mean_confidence_lower : [ double | None ]
Lower limit of the 95% confidence interval about the mean. Assumes a Gaussian distribution. None when there are no elements of positive weight.
mean_confidence_upper : [ double | None ]
Upper limit of the 95% confidence interval about the mean. Assumes a Gaussian distribution. None when there are no elements of positive weight.
bad_row_count : [ double | None ]
The number of rows containing a NaN or infinite value in either the data or weights column.
good_row_count : [ double | None ]
The number of rows not containing a NaN or infinite value in either the data or weights column.
positive_weight_count : [ double | None ]
The number of valid data elements with weight > 0. This is the number of entries used in the statistical calculation.
non_positive_weight_count : [ double | None ]
The number valid data elements with finite weight <= 0.