Frames VertexFrame¶
-
class
VertexFrame
¶ A list of Vertices owned by a Graph.
A VertexFrame is similar to a Frame but with a few important differences:
- VertexFrames are not instantiated directly by the user, instead they are created by defining a vertex type in a graph
- Each row of a VertexFrame represents a vertex in a graph
- VertexFrames have many of the same methods as Frames but not all (for example, flatten_column())
- VertexFrames have extra methods not found on Frames (for example, add_vertices())
- Removing a vertex (or row) from a VertexFrame also removes edges connected to that vertex from the graph
- VertexFrames have special system columns (_vid, _label) that are maintained automatically by the system and cannot be modified by the user
- VertexFrames have a special user defined id column whose value uniquely identifies the vertex
- “Columns” on a VertexFrame can also be thought of as “properties” on vertices
Attributes
column_names Column identifications in the current frame. last_read_date Last time this frame’s data was accessed. name Set or get the name of the frame object. row_count Number of rows in the current frame. schema Current frame column names and types. status Current frame life cycle status. Methods
__init__(self[, source, graph, label, _info]) Examples add_columns(self, func, schema[, columns_accessed]) Add columns to current frame. add_vertices(self, source_frame, id_column_name[, column_names]) Add vertices to a graph. assign_sample(self, sample_percentages[, sample_labels, ...]) Randomly group rows into user-defined classes. bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...]) Classify data into user-defined groups. bin_column_equal_depth(self, column_name[, num_bins, ...]) Classify column into groups with the same frequency. bin_column_equal_width(self, column_name[, num_bins, ...]) Classify column into same-width groups. categorical_summary(self, *column_inputs) [ALPHA] Compute a summary of the data in a column(s) for categorical or numerical data types. classification_metrics(self, label_column, pred_column[, ...]) Model statistics of accuracy, precision, and others. column_median(self, data_column[, weights_column]) Calculate the (weighted) median of a column. column_mode(self, data_column[, weights_column, max_modes_returned]) Evaluate the weights assigned to rows. column_summary_statistics(self, data_column[, ...]) Calculate multiple statistics for a column. copy(self[, columns, where, name]) Create new frame from current frame. correlation(self, data_column_names) Calculate correlation for two columns of current frame. correlation_matrix(self, data_column_names[, matrix_name]) Calculate correlation matrix for two or more columns. count(self, where) Counts the number of rows which meet given criteria. covariance(self, data_column_names) Calculate covariance for exactly two columns. covariance_matrix(self, data_column_names[, matrix_name]) Calculate covariance matrix for two or more columns. cumulative_percent(self, sample_col) [BETA] Add column to frame with cumulative percent sum. cumulative_sum(self, sample_col) [BETA] Add column to frame with cumulative percent sum. daal_pca(self, column_names[, method]) [ALPHA] <Missing Doc> dot_product(self, left_column_names, right_column_names, ...[, ...]) [ALPHA] Calculate dot product for each row in current frame. download(self[, n, offset, columns]) Download frame data from the server into client workspace as a pandas dataframe drop_columns(self, columns) Remove columns from the frame. drop_duplicates(self[, unique_columns]) Remove duplicate vertex rows. drop_rows(self, predicate) Delete rows in this vertex frame that qualify. drop_vertices(self, predicate) [DEPRECATED] drop_vertices has been deprecated. Use drop_rows instead. ecdf(self, column[, result_frame_name]) Builds new frame with columns for data and distribution. entropy(self, data_column[, weights_column]) Calculate the Shannon entropy of a column. export_to_csv(self, folder_name[, separator, count, offset]) Write current frame to HDFS in csv format. export_to_hbase(self, table_name[, key_column_name, family_name]) Write current frame to HBase table. export_to_hive(self, table_name) Write current frame to Hive table. export_to_jdbc(self, table_name[, connector_type, url, driver_name, ...]) Write current frame to JDBC table. export_to_json(self, folder_name[, count, offset]) Write current frame to HDFS in JSON format. filter(self, predicate) <Missing Doc> flatten_column(self, column[, delimiter]) [DEPRECATED] Note that flatten_column() has been deprecated. Use flatten_columns() instead. flatten_columns(self, columns[, delimiters]) Spread data to multiple rows based on cell data. get_error_frame(self) Get a frame with error recordings. group_by(self, group_by_columns, *aggregation_arguments) [BETA] Create summarized frame. histogram(self, column_name[, num_bins, weight_column_name, bin_type]) [BETA] Compute the histogram for a column in a frame. inspect(self[, n, offset, columns, wrap, truncate, round, width, margin, ...]) Pretty-print of the frame data join(self, right, left_on[, right_on, how, name]) [BETA] Join operation on one or two frames, creating a new frame. quantiles(self, column_name, quantiles) New frame with Quantiles and their values. rename_columns(self, names) Rename columns for vertex frame. sort(self, columns[, ascending]) [BETA] Sort the data in a frame. sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth]) [ALPHA] Get a sorted subset of the data. take(self, n[, offset, columns]) Get data subset. tally(self, sample_col, count_val) [BETA] Count number of times a value is seen. tally_percent(self, sample_col, count_val) [BETA] Compute a cumulative percent count. top_k(self, column_name, k[, weights_column]) Most or least frequent column values. unflatten_column(self, columns[, delimiter]) [DEPRECATED] Note that unflatten_column() has been deprecated. Use unflatten_columns() instead. unflatten_columns(self, columns[, delimiter]) Compacts data from multiple rows based on cell data.
-
__init__
(self, source=None, graph=None, label=None)¶ Examples
Parameters: source : (default=None)
graph : (default=None)
label : (default=None)
Given a data file, create a frame, move the data to graph and then define a new VertexFrame and add data to it:
>>> csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32), ('user_name', str), ('movie_id', int32), ('movie_title', str), ('rating', str)]) >>> my_frame = ta.Frame(csv) >>> my_graph = ta.Graph() >>> my_graph.define_vertex_type('users') >>> my_vertex_frame = my_graph.vertices['users'] >>> my_vertex_frame.add_vertices(my_frame, 'user_id', ['user_name', 'age'])
>>> csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32), ... ('user_name', str), ... ('movie_id', int32), ... ('movie_title', str), ... ('rating', str)]) >>> my_frame = ta.Frame(csv) >>> my_graph = ta.Graph() >>> my_graph.define_vertex_type('users') >>> my_vertex_frame = my_graph.vertices['users'] >>> my_vertex_frame.add_vertices(my_frame, 'user_id', ... ['user_name', 'age'])
Retrieve a previously defined graph and retrieve a VertexFrame from it:
>>> my_graph = ta.get_graph("your_graph") >>> my_vertex_frame = my_graph.vertices["your_label"]
Calling methods on a VertexFrame:
>>> my_vertex_frame.vertices["your_label"].inspect(20)
Convert a VertexFrame to a frame:
>>> new_Frame = my_vertex_frame.vertices["label"].copy()