Frames VertexFrame


class VertexFrame

A list of Vertices owned by a Graph.

A VertexFrame is similar to a Frame but with a few important differences:

  • VertexFrames are not instantiated directly by the user, instead they are created by defining a vertex type in a graph
  • Each row of a VertexFrame represents a vertex in a graph
  • VertexFrames have many of the same methods as Frames but not all (for example, flatten_column())
  • VertexFrames have extra methods not found on Frames (for example, add_vertices())
  • Removing a vertex (or row) from a VertexFrame also removes edges connected to that vertex from the graph
  • VertexFrames have special system columns (_vid, _label) that are maintained automatically by the system and cannot be modified by the user
  • VertexFrames have a special user defined id column whose value uniquely identifies the vertex
  • “Columns” on a VertexFrame can also be thought of as “properties” on vertices

Attributes

column_names Column identifications in the current frame.
last_read_date Last time this frame’s data was accessed.
name Set or get the name of the frame object.
row_count Number of rows in the current frame.
schema Current frame column names and types.
status Current frame life cycle status.

Methods

__init__(self[, source, graph, label, _info]) Examples
add_columns(self, func, schema[, columns_accessed]) Add columns to current frame.
add_vertices(self, source_frame, id_column_name[, column_names]) Add vertices to a graph.
assign_sample(self, sample_percentages[, sample_labels, ...]) Randomly group rows into user-defined classes.
bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...]) Classify data into user-defined groups.
bin_column_equal_depth(self, column_name[, num_bins, ...]) Classify column into groups with the same frequency.
bin_column_equal_width(self, column_name[, num_bins, ...]) Classify column into same-width groups.
categorical_summary(self, *column_inputs) [ALPHA] Compute a summary of the data in a column(s) for categorical or numerical data types.
classification_metrics(self, label_column, pred_column[, ...]) Model statistics of accuracy, precision, and others.
column_median(self, data_column[, weights_column]) Calculate the (weighted) median of a column.
column_mode(self, data_column[, weights_column, max_modes_returned]) Evaluate the weights assigned to rows.
column_summary_statistics(self, data_column[, ...]) Calculate multiple statistics for a column.
copy(self[, columns, where, name]) Create new frame from current frame.
correlation(self, data_column_names) Calculate correlation for two columns of current frame.
correlation_matrix(self, data_column_names[, matrix_name]) Calculate correlation matrix for two or more columns.
count(self, where) Counts the number of rows which meet given criteria.
covariance(self, data_column_names) Calculate covariance for exactly two columns.
covariance_matrix(self, data_column_names[, matrix_name]) Calculate covariance matrix for two or more columns.
cumulative_percent(self, sample_col) [BETA] Add column to frame with cumulative percent sum.
cumulative_sum(self, sample_col) [BETA] Add column to frame with cumulative percent sum.
daal_pca(self, column_names[, method]) [ALPHA] <Missing Doc>
dot_product(self, left_column_names, right_column_names, ...[, ...]) [ALPHA] Calculate dot product for each row in current frame.
download(self[, n, offset, columns]) Download frame data from the server into client workspace as a pandas dataframe
drop_columns(self, columns) Remove columns from the frame.
drop_duplicates(self[, unique_columns]) Remove duplicate vertex rows.
drop_rows(self, predicate) Delete rows in this vertex frame that qualify.
drop_vertices(self, predicate) [DEPRECATED] drop_vertices has been deprecated. Use drop_rows instead.
ecdf(self, column[, result_frame_name]) Builds new frame with columns for data and distribution.
entropy(self, data_column[, weights_column]) Calculate the Shannon entropy of a column.
export_to_csv(self, folder_name[, separator, count, offset]) Write current frame to HDFS in csv format.
export_to_hbase(self, table_name[, key_column_name, family_name]) Write current frame to HBase table.
export_to_hive(self, table_name) Write current frame to Hive table.
export_to_jdbc(self, table_name[, connector_type, url, driver_name, ...]) Write current frame to JDBC table.
export_to_json(self, folder_name[, count, offset]) Write current frame to HDFS in JSON format.
filter(self, predicate) <Missing Doc>
flatten_column(self, column[, delimiter]) [DEPRECATED] Note that flatten_column() has been deprecated. Use flatten_columns() instead.
flatten_columns(self, columns[, delimiters]) Spread data to multiple rows based on cell data.
get_error_frame(self) Get a frame with error recordings.
group_by(self, group_by_columns, *aggregation_arguments) [BETA] Create summarized frame.
histogram(self, column_name[, num_bins, weight_column_name, bin_type]) [BETA] Compute the histogram for a column in a frame.
inspect(self[, n, offset, columns, wrap, truncate, round, width, margin, ...]) Pretty-print of the frame data
join(self, right, left_on[, right_on, how, name]) [BETA] Join operation on one or two frames, creating a new frame.
quantiles(self, column_name, quantiles) New frame with Quantiles and their values.
rename_columns(self, names) Rename columns for vertex frame.
sort(self, columns[, ascending]) [BETA] Sort the data in a frame.
sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth]) [ALPHA] Get a sorted subset of the data.
take(self, n[, offset, columns]) Get data subset.
tally(self, sample_col, count_val) [BETA] Count number of times a value is seen.
tally_percent(self, sample_col, count_val) [BETA] Compute a cumulative percent count.
top_k(self, column_name, k[, weights_column]) Most or least frequent column values.
unflatten_column(self, columns[, delimiter]) [DEPRECATED] Note that unflatten_column() has been deprecated. Use unflatten_columns() instead.
unflatten_columns(self, columns[, delimiter]) Compacts data from multiple rows based on cell data.
__init__(self, source=None, graph=None, label=None)

Examples

Parameters:

source : (default=None)

graph : (default=None)

label : (default=None)

Given a data file, create a frame, move the data to graph and then define a new VertexFrame and add data to it:

>>> csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32), ('user_name', str), ('movie_id', int32), ('movie_title', str), ('rating', str)])
>>> my_frame = ta.Frame(csv)
>>> my_graph = ta.Graph()
>>> my_graph.define_vertex_type('users')
>>> my_vertex_frame = my_graph.vertices['users']
>>> my_vertex_frame.add_vertices(my_frame, 'user_id', ['user_name', 'age'])
>>> csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32),
...                                     ('user_name', str),
...                                     ('movie_id', int32),
...                                     ('movie_title', str),
...                                     ('rating', str)])
>>> my_frame = ta.Frame(csv)
>>> my_graph = ta.Graph()
>>> my_graph.define_vertex_type('users')
>>> my_vertex_frame = my_graph.vertices['users']
>>> my_vertex_frame.add_vertices(my_frame, 'user_id',
... ['user_name', 'age'])

Retrieve a previously defined graph and retrieve a VertexFrame from it:

>>> my_graph = ta.get_graph("your_graph")
>>> my_vertex_frame = my_graph.vertices["your_label"]

Calling methods on a VertexFrame:

>>> my_vertex_frame.vertices["your_label"].inspect(20)

Convert a VertexFrame to a frame:

>>> new_Frame = my_vertex_frame.vertices["label"].copy()