SparseGraph

The SparseGraph object represents a graph with an underlying sparse weight matrix representation from scipy.sparse. This has the advantage of being efficient in memory usage for graphs with few edges. Graphs of a 1,000,000 vertices or more can be created with minimal memory overheads. The following is a very simple example of how to use SparseGraph:

from apgl.graph import SparseGraph
import numpy

numVertices = 10

graph = SparseGraph(numVertices)
graph.addEdge(0, 1)
#Note can also use the notation e.g. graph[0,1] = 1 to create an edge
graph[0, 2] = 1
graph[0, 3] = 1
graph[2, 1] = 1
graph[2, 5] = 1
graph[2, 6] = 1
graph[6, 9] = 1

subgraph = graph.subgraph([0,1,2,3])

graph.vlist[0] = "abc"
graph.vlist[1] =  123

The code creates a new SparseGraph with 10 vertices, after which edges are added and a subgraph is extracted using vertices 0, 1, 2, and 3. Notice that numpy.array vertices can be added to a SparseGraph using the VertexList class in the constructor. Finally, the first and second vertices are initialised with “abc” and 123 respectively

In order to speed up certain operations on the graph, SparseGraph can be intialised with an empty sparse matrix of several types. For example, the csr_matrix allows fast out-degree computations wheareas csc_matrix is faster for finding in-degrees of directed graphs.

from apgl.graph import SparseGraph
import numpy
import scipy.sparse

numVertices = 10

weightMatrix = scipy.sparse.lil_matrix((numVertices, numVertices))
graph = SparseGraph(numVertices, W=weightMatrix)
graph[0, 1] = 1
graph[0, 2] = 1


#Output the number of vertices
print(graph.size)

Here, the SparseGraph is initialised with 10 vertices and the sparse matrix weightMatrix passed to the constructor is used to store edge weights.

Methods

class apgl.graph.SparseGraph.SparseGraph(vertices, undirected=True, W=None, dtype=<type 'float'>, frmt='csr')

Bases: apgl.graph.AbstractMatrixGraph.AbstractMatrixGraph

Create a SparseGraph with a given AbstractVertexList or number of vertices, and specify whether it is directed. One can optionally pass in a sparse matrix W which is used as the weight matrix of the graph. Different kinds of sparse matrix can impact the speed of various operations. The currently supported sparse matrix types are: lil_matrix, csr_matrix, csc_matrix and dok_matrix. The default sparse matrix is csr_matrix.

Parameters:
  • vertices – the initial set of vertices as a AbstractVertexList object, or an int to specify the number of vertices in which case vertices are stored in a GeneralVertexList.
  • undirected (boolean) – a boolean variable to indicate if the graph is undirected.
  • W – a square sparse matrix of the same size as the number of vertices, or None to create the default one.
  • dtype – the data type of the sparse matrix if W is not specified.
  • frmt – the format of the sparse matrix: lil, csr or csc if W is not specified
W = None
add(graph)

Add the edge weights of the input graph to the current one. Results in a union of the edges.

Parameters:graph (apgl.graph.SparseGraph) – the input graph.
Returns:A new graph with same vertex list and addition of edge weights
addEdge(vertexIndex1, vertexIndex2, edge=1)

Add a non-zero edge between two vertices.

Parameters:
  • vertexIndex1 (int) – The index of the first vertex.
  • vertexIndex2 (int) – The index of the second vertex.
  • edge (float) – The value of the edge.
addEdges(edgeIndexArray, edgeValues=[])

Takes a numpy array of edge index pairs, and edge values and adds them to this graph. The array is 2 dimensional such that each row is a pair of edge indices.

Parameters:
  • edgeIndexArray (numpy.ndarray) – The array of edge indices with each being a pair of indices.
  • edgeValues (list) – The list of edge values
adjacencyList(useWeights=True)

Returns an adjacency list representation L of the graph, in which L[i] is the list of all neighbours of vertex i. Furthermore, the method returns W in which W[i] which is the corresponding set of weights.

Parameters:useWeights (bool) – If true uses weights of edges as opposed to just adjacencies.
Returns L:A list whose ith element is a list of neighbours for vertex i.
Returns W:A list whose ith element is a list of neighbour weights/adjacencies for vertex i.
adjacencyMatrix()

Return the adjacency matrix in numpy.ndarray format. Warning: should not be used unless sufficient memory is available to store the dense matrix.

Returns:The adjacency matrix in dense format
betweenness()

Return the betweenness of each vertex in the graph. The betweenness is defined as the number of shortest paths passing through each vertex.

Returns:A vector of betweenness values of the same length as the number of vertices in the graph.
breadthFirstSearch(root)

Breadth first search starting from a particular vertex. Returns a list of connected vertices in the order they were found.

Parameters:root (int) – The index of the root vertex.
Returns:A list of vertices connected to the input one via a path in the graph.
clusteringCoefficient()

Find the global clustering coefficient of this graph as defined here http://en.wikipedia.org/wiki/Clustering_coefficient

Returns:The clustering coefficient of this graph.
complement()

Returns a graph with identical vertices (same reference) to the current one, but with the complement of the set of edges. Edges that do not exist have weight 1. This makes a sparse graph dense.

Returns:A new graph with edges complmenting the current one.
concat(graph)

Take a new graph and concatenate it to the current one. Returns a new graph of the concatenated graphs with this graphs vertices first in the new list of vertices.

Parameters:graph (apgl.graph.SparseGraph) – the input graph.
copy()

Returns a copy of this object, which also has a copy of the AbstractVertexList.

degreeDistribution()

Return a vector of (out)degree distributions. The ith element of the vector corresponds to the frequency of degree i.

Returns:A vector of (out)degree distributions.
degreeSequence()
Returns:a vector of the degrees (including self edges) for each vertex for an undirected graph.
density()

The density of the graph is the number of edges/number of possible edges, which does not include self loops. The density of a graph with no vertices is zero.

Returns:The density of the graph.
depthFirstSearch(root)

Depth first search starting from a particular vertex. Returns a list of connected vertices in the order they were found.

Parameters:root (int) – The index of the root vertex.
Returns:A list of vertices connected to the input one via a path in the graph.
diameter(useWeights=False, P=None)

Finds the diameter of a graph i.e. the longest shortest path. If useWeights is True then the weights in the adjacency matrix are used if P is not provided.

Parameters:
  • useWeights (bool) – Whether to use edge weights to compute a diameter.
  • P (ndarray) – An optional nxn matrix whose ijth entry is the shortest path from i to j.
Returns:

The diameter of this graph.

diameter2()

Use Dijkstras Algorithm to compute the diameter of the graph.

Returns:The diameter of the graph.
dijkstrasAlgorithm(vertexIndex, neighbourLists=None)

Run Dijkstras Algorithm on the graph for a given source vertex. The parameter neighbourLists is a tuple containing two lists. The first of this lists contains at the ith position all the neighbours of vertex i. The second list contains the corresponding weight on the edge. If neighbourLists=None, then it is computed automatically and all edge weights are set to 1. Returns an array with the distance to all vertices (including itself).

Parameters:
  • vertexIndex (int) – the index of the source vertex.
  • neighbourLists (list) – A tuple of two lists containing vertex adjacencies and edge weights respectively.
Returns:

An array whose ith element is the distance to vertex i.

effectiveDiameter(q, P=None)

The effective diameter is the minimum d such that for a fraction q of reachable node pairs, the path length is at most d. This is more rubust than the standard diameter method. One can optionally pass in a matrix P whose ijth entry is the shortest path from i to j.

Parameters:
  • q (float) – The fraction of node pairs to consider.
  • P (ndarray) – An optional nxn matrix whose ijth entry is the shortest path from i to j.
Returns:

The effective diameter of this graph.

egoGraph(vertexIndex)

Returns the subgraph composed of the given vertex and its immediate neighbours. In the new graph, the ego is index 0 and neighbours are indexed in order after 0.

Parameters:vertexIndex (int) – the index of the source vertex.
Returns:A subgraph of the current one consisting of only immediate neighbours.
findAllDistances(useWeights=True)

Use the repeated calls to Dijkstra’ algorithm to find the shortest path between all pairs of vertices. If useWeights is true, then the weights are used to compute the path, otherwise adjacencies are used. Note that the shortest path of a vertex to itself is always zero. Returns a matrix whose ij th entry is the shortest path between vertices i and j.

Parameters:useWeights (bool) – Whether to use the edge weight to compute path cost.
Returns:A matrix of shortest paths between all vertices.
findConnectedComponents()

Finds a list of all connected components of the graph, in order of size with the smallest first.

Returns:A list of lists of component indices.
findTrees()

Returns a list of trees for a directed graph. The reason for only allowing directed graphs is that the root of a tree in an undirected graph is ambiguous. Each tree is represented by an list of indices of vertices in the graph.

Returns:A list of trees (vertex indices) in the current graph sorted in descending order by size.
fitPowerLaw()

Fits the out-degree probabilities of this graph using the power law p_d ~ d^-gamma. The value of xmin is the point to start taking examples.

Returns alpha:The power law exponent.
Returns ks:A fit of the power law curve to the data using KS.
Returns xmin:The minimum value of x.
floydWarshall(useWeights=True)

Use the Floyd-Warshall algorithm to find the shortest path between all pairs of vertices. If useWeights is true, then the weights are used to compute the path, otherwise adjacencies are used. Note that the shortest path of a vertex to itself is always zero. Returns a matrix whose ij th entry is the shortest path between vertices i and j. This algorithm scales as O(n^3) with the number of vertices n, and is not recommended for very large graphs.

Parameters:useWeights (bool) – Whether to use the edge weight to compute path cost.
Returns:A matrix of shortest paths between all vertices.
classmethod fromNetworkXGraph(networkXGraph)

Take a networkx Graph or DiGraph object, and return a subclass of AbstractMatrixGraph. Notice that networkx must be installed to use this function. The networkXGraph graph dict must have an attribute VListType which is the type of the VertexList used to construct the SparseGraph. Furthermore, only node attributes index by “label” are stored in the VertexList, and edge values are currently ignored.

Returns:A networkx Graph or DiGraph object.
geodesicDistance(P=None, vertexInds=None)

Compute the mean geodesic distance for a graph. This is denoted for an undirected graph by 1/(1/2 n(n+1)) sum_{i<=j} d_ij where d_ij is the shortest path length between i and j. Note that if i and j are not connected we assume a path length of 0. If the graph is directed then the geodesic distance is 1/(n^2) sum_{i, j} d_ij.

Parameters:
  • P (ndarray) – An optional nxn matrix whose ijth entry is the shortest path from i to j.
  • vertexInds (list) – An optional list of vertices used to compute the mean geodesic distance. If this list is none, then all vertices are used.
Returns:

The mean geodesic distance of this graph.

getAllDirEdges()

Returns the set of directed edges of the current graph as a matrix in which each row corresponds to an edge. For an undirected graph, there is an edge from v1 to v2 and from v2 to v1 if v2!=v1.

Returns:A matrix with 2 columns, and each row corresponding to an edge.
getAllEdges()

Returns the set of edges of the current graph as a matrix in which each row corresponds to an edge. For an undirected graph, v1>=v2.

Returns:A matrix with 2 columns, and each row corresponding to an edge.
getAllVertexIds()

Returns a list of all the vertex IDs of this graph.

getEdge(vertexIndex1, vertexIndex2)

Get the value of an edge, or None if no edge exists.

Parameters:
  • vertexIndex1 (int) – The index of the first vertex.
  • vertexIndex2 (int) – The index of the second vertex.
Returns:

The value of the edge between the given vertex indices.

getEdgeValues(edgeArray)

Take an array of n x 2 of vertex indices and return the corresponding edge values.

Parameters:edgeArray (numpy.ndarray) – An array with an edge on each row
Returns:A vector of n values corresponding to the edge weights of edgeArray
getNumDirEdges()
Returns:the number of edges, taking this graph as a directed graph.
getNumEdges()
Returns:the total number of edges in this graph.
getNumVertices()
Returns:the number of vertices in this graph.
getSparseWeightMatrix()

Returns the original sparse weight matrix.

Returns:A scipy.sparse weight matrix.
getVertex(vertexIndex)

Returns the vertex associated with the given vertex index.

Parameters:vertexIndex (int) – the index of the vertex.
Returns:The value of the vertex at the given index.
getVertexList()
Returns:the AbstractVertexList object of this graph.
getVertices(vertexIndices)

Takes a list of vertex indices and returns the corresponding vertex values.

Parameters:vertexIndices (list) – A list of vertex indices
Returns:A list of vertices corresponding to the indices
getWeightMatrix()

Return the weight matrix in dense format. Warning: should not be used unless sufficient memory is available to store the dense matrix.

Returns:A numpy.ndarray weight matrix.
harmonicGeodesicDistance(P=None, vertexInds=None)

Compute the “harmonic mean” geodesic distance for a graph. This is denoted by the inverse of 1/(1/2 n(n+1)) sum_{i<=j} d_ij^-1 where d_ij is the shortest path length between i and j for an undirected graph. The distance from a node to itself is infinite. For a directed graph, the inverse distance is 1/n^2 sum_{i,j} d_ij^-1.

Parameters:
  • P (ndarray) – An optional nxn matrix whose ijth entry is the shortest path from i to j.
  • vertexInds (list) – An optional list of vertices used to compute the mean geodesic distance. If this list is none, then all vertices are used.
Returns:

The mean harmonic geodesic distance of this graph.

hopCount(P=None)

Returns an array such that the ith element is the number of pairs of vertices reachable within i hops. This includes self pairs, and all other pairs are counted twice in the undirected case otherwise once.

Parameters:P (ndarray) – An optional nxn matrix whose ijth entry is the shortest unweighted path from i to j.
Returns:An array of hop counts.
inDegreeDistribution()

Returns a vector of in-degree distributions. The ith element of the vector corresponds to the frequency of degree i.

Returns:A vector of (in)degree distributions.
inDegreeSequence()
Returns:a vector of the (in)degree sequence for each vertex.
incidenceMatrix()

Return the incidence matrix of this graph as a scipy sparse matrix. The incidence matrix X is of size numVertices x numEdges, and has a 1 in element Xij = -1 of edge j leaves vertex i, and Xij = 1 if edge j enters vertex i. Notice that for an undirected graph XX^T is the laplacian matrix.

intersect(graph)

Take the intersection of the edges of this graph and the input graph. Resulting edge weights are ignored and only adjacencies are stored.

Parameters:graph (apgl.graph.SparseGraph) – the input graph.
Returns:A new graph with the intersection of edges of the current plus graph.
isTree()

Returns true if this graph is a tree. Every vertex must have an in-degree of 1 (i.e. one parent), except the root which has an in-degree of zero and non-zero out-degree.

Returns:A boolean indicating whether the current graph is a tree.
isUndirected()

Returns true if this graph is undirected otherwise false.

laplacianMatrix(outDegree=True, sparse=False)

Return the Laplacian matrix of this graph, which is defined as L_{ii} = deg(i) L_{ij} = -1 if an edge between i and j, otherwise L_{ij} = 0 . For a directed graph one can specify whether to use the out-degree or in-degree.

Parameters:
  • outDegree (bool) – whether to use the out-degree for the computation of the degree matrix
  • sparse (bool) – whether to return a sparse matrix or numpy array
Returns:

A laplacian adjacency matrix.

laplacianWeightMatrix(outDegree=True)

Return the Laplacian matrix of this graph, L = D - W, where D is the degree matrix and W is the weight matrix. For a directed graph one can specify whether to use the out-degree or in-degree.

Parameters:outDegree (bool) – whether to use the out-degree for the computation of the degree matrix
Returns:A laplacian weight matrix.
classmethod load(filename)

Load the graph object from the corresponding file. Data is loaded in a zip format as created using save().

Parameters:filename (str) – The name of the file to load.
Returns:A graph corresponding to the one saved in filename.
static loadMatrix(filename)
maxEigenvector()

Returns the eigenvector of maximum eigenvalue of the adjacency matrix. The eigenvector is of unit length, and measures the centrality of the corresponding vertex. It is based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes.

Returns:The maximum eigenvector of the adjacency matrix.
maxProductPaths()

Find the maximum product paths between all pairs of vertices using a modified version of the Floyd-Warshall algorithm.

Returns:A matrix P whose ijth entry corresponds to the maximal product of edge weights between them.
maybeIsomorphicWith(graph)

Returns false if graph is definitely not isomorphic with the current graph, however a True may mean the graphs are not isomorphic. Makes a comparison with the eigenvalues of the Laplacian matrices.

Returns:True if the current graph is maybe isomorphic with the input one.
multiply(graph)

Multiply the edge weights of the input graph to the current one. Results in an intersection of the edges.

Parameters:graph (apgl.graph.SparseGraph) – the input graph.
Returns:A new graph with edge weights which are multiples of the current and graph
nativeAdjacencyMatrix()
Returns:the adjacency matrix in the native sparse format.
neighbourOf(vertexIndex)

Return an array of the indices of vertices than have an edge going to the input vertex.

Parameters:vertexIndex (int) – the index of a vertex.
Returns:An array of the indices of all vertices with an edge towards the input vertex.
neighbours(vertexIndex)

Return an array of the indices of neighbours. In the case of a directed graph it is an array of those vertices connected by an edge from the current one.

Parameters:vertexIndex (int) – the index of a vertex.
Returns:An array of the indices of all neigbours of the input vertex.
normalisedLaplacianRw(outDegree=True)

Compute the normalised random walk laplacian matrix with L = I - D^-1 W in which W is the weight matrix and D_ii is the sum of the ith vertices weights.

Parameters:outDegree (bool) – whether to use the out-degree for the computation of the degree matrix
Returns:A normalised random-walk laplacian matrix as a numpy array..
normalisedLaplacianSym(outDegree=True, sparse=False)

Compute the normalised symmetric laplacian matrix using L = I - D^-1/2 W D^-1/2, in which W is the weight matrix and D_ii is the sum of the ith vertices weights.

Parameters:
  • outDegree (bool) – whether to use the out-degree for the computation of the degree matrix
  • sparse (bool) – whether to return a sparse matrix or numpy array
Returns:

A normalised symmetric laplacian matrix

outDegreeSequence()
Returns:a vector of the (out)degree sequence for each vertex.
removeAllEdges()

Removes all edges from this graph.

removeEdge(vertexIndex1, vertexIndex2)

Remove an edge between two vertices.

Parameters:
  • vertexIndex1 (int) – The index of the first vertex.
  • vertexIndex2 (int) – The index of the second vertex.
save(filename)

Save the graph object to the corresponding filename under the .zip extension. The adjacency matrix is stored in matrix market format and the AbstractVertexList decides how to store the vertex labels.

Parameters:filename (str) – The name of the file to save.
Returns:The name of the saved zip file.
saveMatrix(W, filename)
setDiff(graph)

Find the edges in the current graph which are not present in the input graph.

Parameters:graph (apgl.graph.SparseGraph) – the input graph.
Returns:A new graph with edges from the current graph and not in the input graph.
setVertex(vertexIndex, vertex)

Set the vertex with given index to a particular value.

Parameters:
  • vertexIndex (int) – the index of the vertex.
  • vertex – the value of the vertex.
setVertexList(vList)

Assign a new VertexList object to this graph. The number of vertices in the VertexList must be the same as in the graph.

Parameters:vList (apgl.graph.AbstractVertexList) – A new subclass of AbstractVertexList to assign to this graph.
setVertices(vertexIndices, vertices)

Assign new values to the vertices corresponding to vertexIndices

Parameters:
  • vertexIndices (list) – A list of vertex indices
  • vertices (list) – A list of vertex values
setWeightMatrix(W)

Set the weight matrix of this graph. Requires as input an ndarray or a scipy sparse matrix with the same dimensions as the current weight matrix. Edges are represented by non-zero edges.

Parameters:W (ndarray or scipy.sparse matrix) – The weight matrix to use.
setWeightMatrixSparse(W)

Set the weight matrix of this graph. Requires as input a scipy sparse matrix with the same dimensions as the current weight matrix. Edges are represented by non-zero edges.

Parameters:W – The weight matrix to use.
size

The number of vertices in the graph

subgraph(vertexIndices)

Pass in a list or set of vertexIndices and returns the subgraph containing those vertices only, and edges between them. The subgraph indices correspond to the sorted input indices.

Parameters:vertexIndices (list) – the indices of the subgraph vertices.
Returns:A new SparseGraph containing only vertices and edges from vertexIndices
toCsc()

Convert the internal matrix representation to csc format (compressed sparse column) in order to improve the efficiency of certain operations.

toCsr()

Convert the internal matrix representation to csr format (compressed sparse row) in order to improve the efficiency of certain operations.

toDictGraph()

Convert to a DictGraph object. Currently ignores vertex labels.

Return graph:A DictGraph object.
toIGraph()

Convert this graph into a igraph Graph object, which requires igraph to be installed. Edge values are stored under the “value” index. Vertices are stored as indices with a “label” value being the corresponding vertex value.

Returns:An igraph Graph object.
toNetworkXGraph()

Convert this graph into a networkx Graph or DiGraph object, which requires networkx to be installed. Edge values are stored under the “value” index. Vertices are stored as indices with a “label” value being the corresponding vertex value. The type of vertex list is stored as a graph attribute under the index “VListType”

Returns:A networkx Graph or DiGraph object.
triangleSequence()

Computes the number of triangles each vertex participates in using the diagonal of the adjcancy matrix. In an undirected graph, a each triangle is counted twice (once for each direction). Note that self loops are not used to form triangles.

Returns:An array of triangle counts for each vertex.
undirected = None
union(graph)

Take the union of the edges of this graph and the input graph. Resulting edge weights are ignored and only adjacencies are stored.

Parameters:graph (apgl.graph.SparseGraph) – the input graph.
Returns:A new graph with the union of edges of the current one.
vList = None
vlist

The vertex list

weightMatrixType()
Returns:the type of the sparse matrix used to store edge weights.

Table Of Contents

Previous topic

PySparseGraph

Next topic

VertexList

This Page