Frames Frame


class Frame

Large table of data.

Acts as a proxy object to a frame of data on the server, with properties and functions to operate on that frame.

Attributes

column_names Column identifications in the current frame.
last_read_date Last time this frame’s data was accessed.
name Set or get the name of the frame object.
row_count Number of rows in the current frame.
schema Current frame column names and types.
status Current frame life cycle status.

Methods

__init__(self[, source, name, _info]) Create a Frame/frame.
add_columns(self, func, schema[, columns_accessed]) Add columns to current frame.
append(self, data) Adds more data to the current frame.
assign_sample(self, sample_percentages[, sample_labels, ...]) Randomly group rows into user-defined classes.
bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...]) Classify data into user-defined groups.
bin_column_equal_depth(self, column_name[, num_bins, ...]) Classify column into groups with the same frequency.
bin_column_equal_width(self, column_name[, num_bins, ...]) Classify column into same-width groups.
categorical_summary(self, *column_inputs) [ALPHA] Compute a summary of the data in a column(s) for categorical or numerical data types.
classification_metrics(self, label_column, pred_column[, ...]) Model statistics of accuracy, precision, and others.
column_median(self, data_column[, weights_column]) Calculate the (weighted) median of a column.
column_mode(self, data_column[, weights_column, max_modes_returned]) Evaluate the weights assigned to rows.
column_summary_statistics(self, data_column[, ...]) Calculate multiple statistics for a column.
copy(self[, columns, where, name]) Create new frame from current frame.
correlation(self, data_column_names) Calculate correlation for two columns of current frame.
correlation_matrix(self, data_column_names[, matrix_name]) Calculate correlation matrix for two or more columns.
count(self, where) Counts the number of rows which meet given criteria.
covariance(self, data_column_names) Calculate covariance for exactly two columns.
covariance_matrix(self, data_column_names[, matrix_name]) Calculate covariance matrix for two or more columns.
cumulative_percent(self, sample_col) [BETA] Add column to frame with cumulative percent sum.
cumulative_sum(self, sample_col) [BETA] Add column to frame with cumulative percent sum.
daal_pca(self, column_names[, method]) [ALPHA] <Missing Doc>
dot_product(self, left_column_names, right_column_names, ...[, ...]) [ALPHA] Calculate dot product for each row in current frame.
download(self[, n, offset, columns]) Download frame data from the server into client workspace as a pandas dataframe
drop_columns(self, columns) Remove columns from the frame.
drop_duplicates(self[, unique_columns]) Modify the current frame, removing duplicate rows.
drop_rows(self, predicate) Erase any row in the current frame which qualifies.
ecdf(self, column[, result_frame_name]) Builds new frame with columns for data and distribution.
entropy(self, data_column[, weights_column]) Calculate the Shannon entropy of a column.
export_to_csv(self, folder_name[, separator, count, offset]) Write current frame to HDFS in csv format.
export_to_hbase(self, table_name[, key_column_name, family_name]) Write current frame to HBase table.
export_to_hive(self, table_name) Write current frame to Hive table.
export_to_jdbc(self, table_name[, connector_type, url, driver_name, ...]) Write current frame to JDBC table.
export_to_json(self, folder_name[, count, offset]) Write current frame to HDFS in JSON format.
filter(self, predicate) Select all rows which satisfy a predicate.
flatten_column(self, column[, delimiter]) [DEPRECATED] Note that flatten_column() has been deprecated. Use flatten_columns() instead.
flatten_columns(self, columns[, delimiters]) Spread data to multiple rows based on cell data.
get_error_frame(self) Get a frame with error recordings.
group_by(self, group_by_columns, *aggregation_arguments) [BETA] Create summarized frame.
helloworld(self[, msg]) This is a Hello World Plugin for Frame.
histogram(self, column_name[, num_bins, weight_column_name, bin_type]) [BETA] Compute the histogram for a column in a frame.
inspect(self[, n, offset, columns, wrap, truncate, round, width, margin, ...]) Pretty-print of the frame data
join(self, right, left_on[, right_on, how, name]) [BETA] Join operation on one or two frames, creating a new frame.
mapreducewordcount(self, input_dir, output_dir) Counts and reports the top 10 words across all columns with string data in a frame.
quantiles(self, column_name, quantiles) New frame with Quantiles and their values.
rename_columns(self, names) Rename columns
sort(self, columns[, ascending]) [BETA] Sort the data in a frame.
sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth]) [ALPHA] Get a sorted subset of the data.
take(self, n[, offset, columns]) Get data subset.
tally(self, sample_col, count_val) [BETA] Count number of times a value is seen.
tally_percent(self, sample_col, count_val) [BETA] Compute a cumulative percent count.
top_k(self, column_name, k[, weights_column]) Most or least frequent column values.
unflatten_column(self, columns[, delimiter]) [DEPRECATED] Note that unflatten_column() has been deprecated. Use unflatten_columns() instead.
unflatten_columns(self, columns[, delimiter]) Compacts data from multiple rows based on cell data.
wordcount(self) Counts and reports the top 10 words across all columns with string data in a frame.
__init__(self, source=None, name=None)

Create a Frame/frame.

Parameters:

source : CsvFile | Frame (default=None)

A source of initial data.

name : str (default=None)

The name of the newly created frame. Default is None.

Notes

A frame with no name is subject to garbage collection.

If a string in the CSV file starts and ends with a double-quote (”) character, the character is stripped off of the data before it is put into the field. Anything, including delimiters, between the double-quote characters is considered part of the str. If the first character after the delimiter is anything other than a double-quote character, the string will be composed of all the characters between the delimiters, including double-quotes. If the first field type is str, leading spaces on each row are considered part of the str. If the last field type is str, trailing spaces on each row are considered part of the str.

Examples

Create a new frame based upon the data described in the CsvFile object my_csv_schema. Name the frame “myframe”. Create a Frame my_frame to access the data:

>>> my_frame = ta.Frame(my_csv_schema, "myframe")

A Frame object has been created and my_frame is its proxy. It brought in the data described by my_csv_schema. It is named myframe.

Create an empty frame; name it “yourframe”:

>>> your_frame = ta.Frame(name='yourframe')

A frame has been created and Frame your_frame is its proxy. It has no data yet, but it does have the name yourframe.