Commands frame/column_summary_statistics¶
Calculate multiple statistics for a column.
POST /v1/commands/¶
GET /v1/commands/:id¶
Request¶
Route
POST /v1/commands/
Body
name: | frame/column_summary_statistics |
---|---|
arguments: | frame : Frame
data_column : unicode
weights_column : unicode (default=None)
use_population_variance : bool (default=None)
|
Headers
Authorization: test_api_key_1
Content-type: application/json
Description
Notes¶
- Sample Variance
Sample Variance is computed by the following formula:
where is sum of weights over valid elements of positive weight, and is the weighted mean.
- Population Variance
Population Variance is computed by the following formula:
where is sum of weights over valid elements of positive weight, and is the weighted mean.
- Standard Deviation
- The square root of the variance.
- Logging Invalid Data
A row is bad when it contains a NaN or infinite value in either its data or weights column. In this case, it contributes to bad_row_count; otherwise it contributes to good row count.
A good row can be skipped because the value in its weight column is less than or equal to 0. In this case, it contributes to non_positive_weight_count, otherwise (when the weight is greater than 0) it contributes to valid_data_weight_pair_count.
Equations
bad_row_count + good_row_count = # rows in the frame positive_weight_count + non_positive_weight_count = good_row_countIn particular, when no weights column is provided and all weights are 1.0:
non_positive_weight_count = 0 and positive_weight_count = good_row_count
Response¶
Status
200 OK
Body
Returns information about the command. See the Response Body for Get Command here below. It is the same.
GET /v1/commands/:id¶
Request¶
Route
GET /v1/commands/18
Body
(None)
Headers
Authorization: test_api_key_1
Content-type: application/json
Response¶
Status
200 OK
Body
dict
Dictionary containing summary statistics. The data returned is composed of multiple components:
mean : [ double | None ]Arithmetic mean of the data.geometric_mean : [ double | None ]Geometric mean of the data. None when there is a data element <= 0, 1.0 when there are no data elements.variance : [ double | None ]None when there are <= 1 many data elements. Sample variance is the weighted sum of the squared distance of each data element from the weighted mean, divided by the total weight minus 1. None when the sum of the weights is <= 1. Population variance is the weighted sum of the squared distance of each data element from the weighted mean, divided by the total weight.standard_deviation : [ double | None ]The square root of the variance. None when sample variance is being used and the sum of weights is <= 1.total_weight : longThe count of all data elements that are finite numbers. In other words, after excluding NaNs and infinite values.minimum : [ double | None ]Minimum value in the data. None when there are no data elements.maximum : [ double | None ]Maximum value in the data. None when there are no data elements.mean_confidence_lower : [ double | None ]Lower limit of the 95% confidence interval about the mean. Assumes a Gaussian distribution. None when there are no elements of positive weight.mean_confidence_upper : [ double | None ]Upper limit of the 95% confidence interval about the mean. Assumes a Gaussian distribution. None when there are no elements of positive weight.bad_row_count : [ double | None ]The number of rows containing a NaN or infinite value in either the data or weights column.good_row_count : [ double | None ]The number of rows not containing a NaN or infinite value in either the data or weights column.positive_weight_count : [ double | None ]The number of valid data elements with weight > 0. This is the number of entries used in the statistical calculation.non_positive_weight_count : [ double | None ]The number valid data elements with finite weight <= 0.