Frames EdgeFrame¶
-
class
EdgeFrame
¶ A list of Edges owned by a Graph.
An EdgeFrame is similar to a Frame but with a few important differences:
- EdgeFrames are not instantiated directly by the user, instead they are created by defining an edge type in a graph
- Each row of an EdgeFrame represents an edge in a graph
- EdgeFrames have many of the same methods as Frames but not all
- EdgeFrames have extra methods not found on Frames (e.g. add_edges())
- EdgeFrames have a dependency on one or two VertexFrames (adding an edge to an EdgeFrame requires either vertices to be present or for the user to specify create_missing_vertices=True)
- EdgeFrames have special system columns (_eid, _label, _src_vid, _dest_vid) that are maintained automatically by the system and cannot be modified by the user
- “Columns” on an EdgeFrame can also be thought of as “properties” on Edges
Attributes
column_names Column identifications in the current frame. last_read_date Last time this frame’s data was accessed. name Set or get the name of the frame object. row_count Number of rows in the current frame. schema Current frame column names and types. status Current frame life cycle status. Methods
__init__(self[, graph, label, src_vertex_label, ...]) Examples add_columns(self, func, schema[, columns_accessed]) Add columns to current frame. add_edges(self, source_frame, column_name_for_source_vertex_id, ...[, ...]) Add edges to a graph. assign_sample(self, sample_percentages[, sample_labels, ...]) Randomly group rows into user-defined classes. bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...]) Classify data into user-defined groups. bin_column_equal_depth(self, column_name[, num_bins, ...]) Classify column into groups with the same frequency. bin_column_equal_width(self, column_name[, num_bins, ...]) Classify column into same-width groups. categorical_summary(self, *column_inputs) [ALPHA] Compute a summary of the data in a column(s) for categorical or numerical data types. classification_metrics(self, label_column, pred_column[, ...]) Model statistics of accuracy, precision, and others. column_median(self, data_column[, weights_column]) Calculate the (weighted) median of a column. column_mode(self, data_column[, weights_column, max_modes_returned]) Evaluate the weights assigned to rows. column_summary_statistics(self, data_column[, ...]) Calculate multiple statistics for a column. copy(self[, columns, where, name]) Create new frame from current frame. correlation(self, data_column_names) Calculate correlation for two columns of current frame. correlation_matrix(self, data_column_names[, matrix_name]) Calculate correlation matrix for two or more columns. count(self, where) Counts the number of rows which meet given criteria. covariance(self, data_column_names) Calculate covariance for exactly two columns. covariance_matrix(self, data_column_names[, matrix_name]) Calculate covariance matrix for two or more columns. cumulative_percent(self, sample_col) [BETA] Add column to frame with cumulative percent sum. cumulative_sum(self, sample_col) [BETA] Add column to frame with cumulative percent sum. daal_pca(self, column_names[, method]) [ALPHA] <Missing Doc> dot_product(self, left_column_names, right_column_names, ...[, ...]) [ALPHA] Calculate dot product for each row in current frame. download(self[, n, offset, columns]) Download frame data from the server into client workspace as a pandas dataframe drop_columns(self, columns) Remove columns from the frame. drop_duplicates(self[, unique_columns]) Modify the current frame, removing duplicate rows. drop_rows(self, predicate) Erase any row in the current frame which qualifies. ecdf(self, column[, result_frame_name]) Builds new frame with columns for data and distribution. entropy(self, data_column[, weights_column]) Calculate the Shannon entropy of a column. export_to_csv(self, folder_name[, separator, count, offset]) Write current frame to HDFS in csv format. export_to_hbase(self, table_name[, key_column_name, family_name]) Write current frame to HBase table. export_to_hive(self, table_name) Write current frame to Hive table. export_to_jdbc(self, table_name[, connector_type, url, driver_name, ...]) Write current frame to JDBC table. export_to_json(self, folder_name[, count, offset]) Write current frame to HDFS in JSON format. filter(self, predicate) Select all rows which satisfy a predicate. flatten_column(self, column[, delimiter]) [DEPRECATED] Note that flatten_column() has been deprecated. Use flatten_columns() instead. flatten_columns(self, columns[, delimiters]) Spread data to multiple rows based on cell data. get_error_frame(self) Get a frame with error recordings. group_by(self, group_by_columns, *aggregation_arguments) [BETA] Create summarized frame. histogram(self, column_name[, num_bins, weight_column_name, bin_type]) [BETA] Compute the histogram for a column in a frame. inspect(self[, n, offset, columns, wrap, truncate, round, width, margin, ...]) Pretty-print of the frame data join(self, right, left_on[, right_on, how, name]) [BETA] Join operation on one or two frames, creating a new frame. quantiles(self, column_name, quantiles) New frame with Quantiles and their values. rename_columns(self, names) Rename columns for edge frame. sort(self, columns[, ascending]) [BETA] Sort the data in a frame. sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth]) [ALPHA] Get a sorted subset of the data. take(self, n[, offset, columns]) Get data subset. tally(self, sample_col, count_val) [BETA] Count number of times a value is seen. tally_percent(self, sample_col, count_val) [BETA] Compute a cumulative percent count. top_k(self, column_name, k[, weights_column]) Most or least frequent column values. unflatten_column(self, columns[, delimiter]) [DEPRECATED] Note that unflatten_column() has been deprecated. Use unflatten_columns() instead. unflatten_columns(self, columns[, delimiter]) Compacts data from multiple rows based on cell data.
-
__init__
(self, graph=None, label=None, src_vertex_label=None, dest_vertex_label=None, directed=None)¶ Examples
Parameters: graph : (default=None)
label : (default=None)
src_vertex_label : (default=None)
dest_vertex_label : (default=None)
directed : (default=None)
Given a data file /movie.csv, create a frame to match this data and move the data to the frame. Create an empty graph and define some vertex and edge types.
>>> my_csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32), ... ('user_name', str), ... ('movie_id', int32), ... ('movie_title', str), ... ('rating', str)]) >>> my_frame = ta.Frame(my_csv) >>> my_graph = ta.Graph() >>> my_graph.define_vertex_type('users') >>> my_graph.define_vertex_type('movies') >>> my_graph.define_edge_type('ratings','users','movies',directed=True)
Add data to the graph from the frame:
>>> my_graph.vertices['users'].add_vertices(my_frame, 'user_id', ['user_name']) >>> my_graph.vertices['movies].add_vertices(my_frame, 'movie_id', ['movie_title])
Create an edge frame from the graph, and add edge data from the frame.
>>> my_edge_frame = graph.edges['ratings'] >>> my_edge_frame.add_edges(my_frame, 'user_id', 'movie_id', ['rating']
Retrieve a previously defined graph and retrieve an EdgeFrame from it:
>>> my_old_graph = ta.get_graph("your_graph") >>> my_new_edge_frame = my_old_graph.edges["your_label"]
Calling methods on an EdgeFrame:
>>> my_new_edge_frame.inspect(20)
Copy an EdgeFrame to a frame using the copy method:
>>> my_new_frame = my_new_edge_frame.copy()