Frames VertexFrame¶

class VertexFrame¶

A list of Vertices owned by a Graph.

A VertexFrame is similar to a Frame but with a few important differences:

VertexFrames are not instantiated directly by the user, instead they are created by defining a vertex type in a graph
Each row of a VertexFrame represents a vertex in a graph
VertexFrames have many of the same methods as Frames but not all (for example, flatten_column())
VertexFrames have extra methods not found on Frames (for example, add_vertices())
Removing a vertex (or row) from a VertexFrame also removes edges connected to that vertex from the graph
VertexFrames have special system columns (_vid, _label) that are maintained automatically by the system and cannot be modified by the user
VertexFrames have a special user defined id column whose value uniquely identifies the vertex
“Columns” on a VertexFrame can also be thought of as “properties” on vertices

Attributes

column_names	Column identifications in the current frame.
last_read_date	Last time this frame’s data was accessed.
name	Set or get the name of the frame object.
row_count	Number of rows in the current frame.
schema	Current frame column names and types.
status	Current frame life cycle status.

Methods

__init__(self[, source, graph, label, _info])	Examples
add_columns(self, func, schema[, columns_accessed])	Add columns to current frame.
add_vertices(self, source_frame, id_column_name[, column_names])	Add vertices to a graph.
assign_sample(self, sample_percentages[, sample_labels, ...])	Randomly group rows into user-defined classes.
bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...])	Classify data into user-defined groups.
bin_column_equal_depth(self, column_name[, num_bins, ...])	Classify column into groups with the same frequency.
bin_column_equal_width(self, column_name[, num_bins, ...])	Classify column into same-width groups.
categorical_summary(self, *column_inputs)	[ALPHA] Compute a summary of the data in a column(s) for categorical or numerical data types.
classification_metrics(self, label_column, pred_column[, ...])	Model statistics of accuracy, precision, and others.
column_median(self, data_column[, weights_column])	Calculate the (weighted) median of a column.
column_mode(self, data_column[, weights_column, max_modes_returned])	Evaluate the weights assigned to rows.
column_summary_statistics(self, data_column[, ...])	Calculate multiple statistics for a column.
copy(self[, columns, where, name])	Create new frame from current frame.
correlation(self, data_column_names)	Calculate correlation for two columns of current frame.
correlation_matrix(self, data_column_names[, matrix_name])	Calculate correlation matrix for two or more columns.
count(self, where)	Counts the number of rows which meet given criteria.
covariance(self, data_column_names)	Calculate covariance for exactly two columns.
covariance_matrix(self, data_column_names[, matrix_name])	Calculate covariance matrix for two or more columns.
cumulative_percent(self, sample_col)	[BETA] Add column to frame with cumulative percent sum.
cumulative_sum(self, sample_col)	[BETA] Add column to frame with cumulative percent sum.
daal_pca(self, column_names[, method])	[ALPHA] <Missing Doc>
dot_product(self, left_column_names, right_column_names, ...[, ...])	[ALPHA] Calculate dot product for each row in current frame.
download(self[, n, offset, columns])	Download frame data from the server into client workspace as a pandas dataframe
drop_columns(self, columns)	Remove columns from the frame.
drop_duplicates(self[, unique_columns])	Remove duplicate vertex rows.
drop_rows(self, predicate)	Delete rows in this vertex frame that qualify.
drop_vertices(self, predicate)	[DEPRECATED] drop_vertices has been deprecated. Use drop_rows instead.
ecdf(self, column[, result_frame_name])	Builds new frame with columns for data and distribution.
entropy(self, data_column[, weights_column])	Calculate the Shannon entropy of a column.
export_to_csv(self, folder_name[, separator, count, offset])	Write current frame to HDFS in csv format.
export_to_hbase(self, table_name[, key_column_name, family_name])	Write current frame to HBase table.
export_to_hive(self, table_name)	Write current frame to Hive table.
export_to_jdbc(self, table_name[, connector_type, url, driver_name, ...])	Write current frame to JDBC table.
export_to_json(self, folder_name[, count, offset])	Write current frame to HDFS in JSON format.
filter(self, predicate)	<Missing Doc>
flatten_column(self, column[, delimiter])	[DEPRECATED] Note that flatten_column() has been deprecated. Use flatten_columns() instead.
flatten_columns(self, columns[, delimiters])	Spread data to multiple rows based on cell data.
get_error_frame(self)	Get a frame with error recordings.
group_by(self, group_by_columns, *aggregation_arguments)	[BETA] Create summarized frame.
histogram(self, column_name[, num_bins, weight_column_name, bin_type])	[BETA] Compute the histogram for a column in a frame.
inspect(self[, n, offset, columns, wrap, truncate, round, width, margin, ...])	Pretty-print of the frame data
join(self, right, left_on[, right_on, how, name])	[BETA] Join operation on one or two frames, creating a new frame.
quantiles(self, column_name, quantiles)	New frame with Quantiles and their values.
rename_columns(self, names)	Rename columns for vertex frame.
sort(self, columns[, ascending])	[BETA] Sort the data in a frame.
sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth])	[ALPHA] Get a sorted subset of the data.
take(self, n[, offset, columns])	Get data subset.
tally(self, sample_col, count_val)	[BETA] Count number of times a value is seen.
tally_percent(self, sample_col, count_val)	[BETA] Compute a cumulative percent count.
top_k(self, column_name, k[, weights_column])	Most or least frequent column values.
unflatten_column(self, columns[, delimiter])	[DEPRECATED] Note that unflatten_column() has been deprecated. Use unflatten_columns() instead.
unflatten_columns(self, columns[, delimiter])	Compacts data from multiple rows based on cell data.

__init__(self, source=None, graph=None, label=None)¶

Examples

Parameters:

Parameters:	source : (default=None) graph : (default=None) label : (default=None)

source : (default=None)

graph : (default=None)

label : (default=None)

Given a data file, create a frame, move the data to graph and then define a new VertexFrame and add data to it:

>>>>>> csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32), ('user_name', str), ('movie_id', int32), ('movie_title', str), ('rating', str)])
>>> my_frame = ta.Frame(csv)
>>> my_graph = ta.Graph()
>>> my_graph.define_vertex_type('users')
>>> my_vertex_frame = my_graph.vertices['users']
>>> my_vertex_frame.add_vertices(my_frame, 'user_id', ['user_name', 'age'])

>>>>>> csv = ta.CsvFile("/movie.csv", schema= [('user_id', int32),
...                                     ('user_name', str),
...                                     ('movie_id', int32),
...                                     ('movie_title', str),
...                                     ('rating', str)])
>>> my_frame = ta.Frame(csv)
>>> my_graph = ta.Graph()
>>> my_graph.define_vertex_type('users')
>>> my_vertex_frame = my_graph.vertices['users']
>>> my_vertex_frame.add_vertices(my_frame, 'user_id',
... ['user_name', 'age'])

Retrieve a previously defined graph and retrieve a VertexFrame from it:

>>>>>> my_graph = ta.get_graph("your_graph")
>>> my_vertex_frame = my_graph.vertices["your_label"]

Calling methods on a VertexFrame:

>>>>>> my_vertex_frame.vertices["your_label"].inspect(20)

Convert a VertexFrame to a frame:

>>>>>> new_Frame = my_vertex_frame.vertices["label"].copy()

Quick search

Table Of Contents

Frames VertexFrame¶