Graphs Graph¶
-
class
Graph
¶ Creates a seamless property graph.
A seamless graph is a collection of vertex and edge lists stored as frames. This allows frame-like operations against graph data. Many frame methods are available to work with vertices and edges. Vertex and edge properties are stored as columns.
A seamless graph is better suited for bulk OLAP-type operations
Attributes
edge_count Get the total number of edges in the graph. edges Edge frame collection last_read_date Read-only property - Last time this frame’s data was accessed. name Set or get the name of the graph object. status Read-only property - Current graph life cycle status. vertex_count Get the total number of vertices in the graph. vertices Vertex frame collection Methods
__init__(self[, name, _info]) Initialize the graph. annotate_degrees(self, output_property_name[, degree_option, ...]) Make new graph with degrees. annotate_weighted_degrees(self, output_property_name[, ...]) Calculates the weighted degree of each vertex with respect to an (optional) set of labels. clustering_coefficient(self[, output_property_name, ...]) Coefficient of graph with respect to labels. copy(self[, name]) Make a copy of the current graph. define_edge_type(self, label, src_vertex_label, dest_vertex_label) Define an edge type. define_vertex_type(self, label) Define a vertex type by label. graphx_connected_components(self[, ...]) Implements the connected components computation on a graph by invoking graphx api. graphx_label_propagation(self[, max_steps, ...]) [ALPHA] Implements the label propagation computation on a graph by invoking graphx api. graphx_pagerank(self, output_property[, input_edge_labels, ...]) Determine which vertices are the most important. graphx_triangle_count(self, output_property[, input_edge_labels]) Number of triangles among vertices of current graph. kclique_percolation(self, clique_size, community_property_label) [ALPHA] Find groups of vertices with similar attributes. label_propagation(self, prior_property, posterior_property, ...) Classification on sparse data using Belief Propagation. loopy_belief_propagation(self, prior_property, ...[, ...]) Classification on sparse data using Belief Propagation. vertex_outdegree(self) Counts the out-degree of vertices in a graph.
-
__init__
(self, name=None)¶ Initialize the graph.
Parameters: name : str (default=None)
Name for the new graph. Default is None.
Examples
This example uses a single source data frame and creates a graph of ‘user’ and ‘movie’ vertices connected by ‘rating’ edges.
The first step is to bring in some data to create a frame as the source for a graph:
>>> schema = [('viewer', str), ('profile', ta.int32), ('movie', str), ('rating', ta.int32)] >>> data1 = [['fred',0,'Croods',5], ... ['fred',0,'Jurassic Park',5], ... ['fred',0,'2001',2], ... ['fred',0,'Ice Age',4], ... ['wilma',0,'Jurassic Park',3], ... ['wilma',0,'2001',5], ... ['wilma',0,'Ice Age',4], ... ['pebbles',1,'Croods',4], ... ['pebbles',1,'Land Before Time',3], ... ['pebbles',1,'Ice Age',5]] >>> data2 = [['betty',0,'Croods',5], ... ['betty',0,'Jurassic Park',3], ... ['betty',0,'Land Before Time',4], ... ['betty',0,'Ice Age',3], ... ['barney',0,'Croods',5], ... ['barney',0,'Jurassic Park',5], ... ['barney',0,'Land Before Time',3], ... ['barney',0,'Ice Age',5], ... ['bamm bamm',1,'Croods',5], ... ['bamm bamm',1,'Land Before Time',3]] >>> frame = ta.Frame(ta.UploadRows(data1, schema)) [===Job Progress===]
>>> frame2 = ta.Frame(ta.UploadRows(data2, schema)) [===Job Progress===]
>>> frame.inspect() [#] viewer profile movie rating =============================================== [0] fred 0 Croods 5 [1] fred 0 Jurassic Park 5 [2] fred 0 2001 2 [3] fred 0 Ice Age 4 [4] wilma 0 Jurassic Park 3 [5] wilma 0 2001 5 [6] wilma 0 Ice Age 4 [7] pebbles 1 Croods 4 [8] pebbles 1 Land Before Time 3 [9] pebbles 1 Ice Age 5
Now, make an empty graph object:
>>> graph = ta.Graph()
Then, define the types of vertices and edges this graph will be made of:
>>> graph.define_vertex_type('viewer') [===Job Progress===] >>> graph.define_vertex_type('film') [===Job Progress===] >>> graph.define_edge_type('rating', 'viewer', 'film') [===Job Progress===]
And finally, add the data to the graph:
>>> graph.vertices['viewer'].add_vertices(frame, 'viewer', ['profile']) [===Job Progress===] >>> graph.vertices['viewer'].inspect() [#] _vid _label viewer profile =================================== [0] 1 viewer fred 0 [1] 8 viewer pebbles 1 [2] 5 viewer wilma 0
>>> graph.vertices['film'].add_vertices(frame, 'movie') [===Job Progress===] >>> graph.vertices['film'].inspect() [#] _vid _label movie =================================== [0] 19 film Land Before Time [1] 14 film Ice Age [2] 12 film Jurassic Park [3] 11 film Croods [4] 13 film 2001
>>> graph.edges['rating'].add_edges(frame, 'viewer', 'movie', ['rating']) [===Job Progress===] >>> graph.edges['rating'].inspect() [#] _eid _src_vid _dest_vid _label rating ============================================== [0] 24 1 14 rating 4 [1] 22 1 12 rating 5 [2] 21 1 11 rating 5 [3] 23 1 13 rating 2 [4] 29 8 19 rating 3 [5] 30 8 14 rating 5 [6] 28 8 11 rating 4 [7] 27 5 14 rating 4 [8] 25 5 12 rating 3 [9] 26 5 13 rating 5
Explore basic graph properties:
>>> graph.vertex_count [===Job Progress===] 8
>>> graph.vertices viewer : [viewer, profile], count = 3 film : [movie], count = 5
>>> graph.edge_count [===Job Progress===] 10
>>> graph.edges rating : [rating], count = 10
>>> graph.status u'ACTIVE'
>>> graph.last_read_date datetime.datetime(2016, 1, 8, 20, 24, 9, 650866)
>>> graph Graph <unnamed> status = ACTIVE (last_read_date = -etc-) vertices = viewer : [viewer, profile], count = 3 film : [movie], count = 5 edges = rating : [rating], count = 10
Data from other frames can be added to the graph by making more calls to add_vertices and add_edges.
>>> frame2 = ta.Frame(ta.CsvFile("/datasets/extra-movie-data.csv", frame.schema)) [===Job Progress===]
>>> graph.vertices['viewer'].add_vertices(frame2, 'viewer', ['profile']) [===Job Progress===] >>> graph.vertices['viewer'].inspect() [#] _vid _label viewer profile ===================================== [0] 5 viewer wilma 0 [1] 1 viewer fred 0 [2] 31 viewer betty 0 [3] 35 viewer barney 0 [4] 8 viewer pebbles 1 [5] 39 viewer bamm bamm 1
>>> graph.vertices['film'].add_vertices(frame2, 'movie') [===Job Progress===] >>> graph.vertices['film'].inspect() [#] _vid _label movie =================================== [0] 13 film 2001 [1] 14 film Ice Age [2] 11 film Croods [3] 19 film Land Before Time [4] 12 film Jurassic Park
>>> graph.vertex_count [===Job Progress===] 11
>>> graph.edges['rating'].add_edges(frame2, 'viewer', 'movie', ['rating']) [===Job Progress===]
>>> graph.edges['rating'].inspect(20) [##] _eid _src_vid _dest_vid _label rating =============================================== [0] 24 1 14 rating 4 [1] 22 1 12 rating 5 [2] 21 1 11 rating 5 [3] 23 1 13 rating 2 [4] 29 8 19 rating 3 [5] 30 8 14 rating 5 [6] 28 8 11 rating 4 [7] 27 5 14 rating 4 [8] 25 5 12 rating 3 [9] 26 5 13 rating 5 [10] 60 39 19 rating 3 [11] 59 39 11 rating 5 [12] 53 31 19 rating 4 [13] 54 31 14 rating 3 [14] 52 31 12 rating 3 [15] 51 31 11 rating 5 [16] 57 35 19 rating 3 [17] 58 35 14 rating 5 [18] 56 35 12 rating 5 [19] 55 35 11 rating 5
>>> graph.edge_count [===Job Progress===] 20
Now we’ll copy the graph and then change it.
>>> graph2 = graph.copy() [===Job Progress===]
>>> graph2 Graph <unnamed> status = ACTIVE (last_read_date = -etc-) vertices = viewer : [viewer, profile], count = 6 film : [movie], count = 5 edges = rating : [rating], count = 20
We can rename the columns in the frames representing the vertices and edges, similar to regular frame operations.
>>> graph2.vertices['viewer'].rename_columns({'viewer': 'person'}) [===Job Progress===]
>>> graph2.vertices viewer : [person, profile], count = 6 film : [movie], count = 5
>>> graph2.edges['rating'].rename_columns({'rating': 'score'}) [===Job Progress===]
>>> graph2.edges rating : [score], count = 20
We can apply filter and drop functions to the vertex and edge frames.
>>> graph2.vertices['viewer'].filter(lambda v: v.person.startswith("b")) [===Job Progress===]
>>> graph2.vertices['viewer'].inspect() [#] _vid _label person profile ===================================== [0] 31 viewer betty 0 [1] 35 viewer barney 0 [2] 39 viewer bamm bamm 1
>>> graph2.vertices['viewer'].drop_duplicates("profile") [===Job Progress===]
>>> graph2.vertices['viewer'].inspect() [#] _vid _label person profile ===================================== [0] 31 viewer betty 0 [1] 39 viewer bamm bamm 1
Now check our edges to see that they have also be filtered.
>>> graph2.edges['rating'].inspect() [#] _eid _src_vid _dest_vid _label score ============================================= [0] 60 39 19 rating 3 [1] 59 39 11 rating 5 [2] 53 31 19 rating 4 [3] 54 31 14 rating 3 [4] 52 31 12 rating 3 [5] 51 31 11 rating 5
Only source vertices 31 and 39 remain.
Drop row for the movie ‘Croods’ (vid 41) from the film VertexFrame.
>>> graph2.vertices['film'].inspect() [#] _vid _label movie =================================== [0] 13 film 2001 [1] 14 film Ice Age [2] 11 film Croods [3] 19 film Land Before Time [4] 12 film Jurassic Park
>>> graph2.vertices['film'].drop_rows(lambda row: row.movie=='Croods') [===Job Progress===]
>>> graph2.vertices['film'].inspect() [#] _vid _label movie =================================== [0] 13 film 2001 [1] 14 film Ice Age [2] 19 film Land Before Time [3] 12 film Jurassic Park
Dangling edges (edges that correspond to the movie ‘Croods’, vid 41) were also removed:
>>> graph2.edges['rating'].inspect() [#] _eid _src_vid _dest_vid _label score ============================================= [0] 52 31 12 rating 3 [1] 54 31 14 rating 3 [2] 60 39 19 rating 3 [3] 53 31 19 rating 4