Methods for building networks from bibliographic data.
Each network relies on certain meta data in the Paper associated with each document. Often we wish to construct a network with nodes representing these documents and edges representing relationships between those documents, but this is not always the case.
Where it is the case, it is recommended but not required that nodes are represented by an identifier from {ayjid, wosid, pmid, doi}. Each has certain benefits. If the documents to be networked come from a single database source such as the Web of Science, wosid is most appropriate. If not, using doi will result in a more accurate, but also more sparse network; while ayjid will result in a less accurate, but more complete network.
Any type of meta data from the Paper may be used as an identifier, however.
We use “head” and “tail” nomenclature to refer to the members of a directed edge (x,y), x -> y, xy, etc. by calling x the “tail” and y the “head”.
authors | Methods for generating networks in which authors are vertices. |
helpers | Helper functions for generating networks. |
papers | Methods for generating networks in which papers are vertices. |
terms | Methods for building networks from terms in bibliographic records. |
topics | Build networks from topics in a topic model. |
Methods for generating networks in which authors are vertices.
author_cocitation(papers[, threshold]) | Generates an author co-citation network; edges indicate co-citation of authors’ papers. |
author_coinstitution(Papers[, threshold]) | Generate a co-institution graph, where edges indicate shared affiliation. |
author_institution(Papers[, edge_attribs]) | Generate a bi-partite graph connecting authors and their institutions. |
author_papers(papers[, node_id, paper_attribs]) | Generate an author_papers network NetworkX directed graph. |
coauthors(papers[, threshold, edge_attribs, ...]) | Generate a co-author network. |
Generates an author co-citation network; edges indicate co-citation of authors’ papers.
Similar to papers.cocitation(), except that vertices are authors rather than papers. To generate an author co-citation network, use the networks.authors.author_cocitation() method:
>>> ACC = nt.authors.author_cocitation(papers)
>>> ACC
<networkx.classes.graph.Graph object at 0x106571190>
Element | Description |
---|---|
Nodes | Author name. |
Edge | (a, b) if a and b are referenced by the same paper in papers |
Edge attribute | ‘weight’, the number of papers that co-cite a and b. |
Parameters : | papers : list
threshold : int
|
---|---|
Returns : | cocitation : networkx.Graph
|
Generate a co-institution graph, where edges indicate shared affiliation.
Some bibliographic datasets, including data from the Web of Science, includes the institutional affiliations of authors. In a co-institution graph, two authors (vertices) have an edge between them if they share an institutional affiliation in the dataset. Note that data about institutional affiliations varies in the WoS database so this will yield more reliable results for more recent publications.
To generate a co-institution network, use the networks.authors.author_coinstitution() method:
>>> ACI = nt.authors.author_coinstitution(papers)
>>> ACI
<networkx.classes.graph.Graph object at 0x106571190>
Element | Description |
---|---|
Node | Authors. |
Node Attribute | type (string). ‘author’ or ‘institution’. |
Edges | (a, b) where a and b are affiliated with the same institution. |
Edge attribute | overlap (int). number of shared institutions. |
Parameters : | Papers : list
threshold : int
|
---|---|
Returns : | coinstitution : NetworkX graph
|
Generate a bi-partite graph connecting authors and their institutions.
This may be slightly ambiguous for WoS data where there is no explicit author-institution mapping. Edge weights are the number of co-associations between an author and an institution, which should help resolve this ambiguity (the more data the better).
Element | Description |
---|---|
Node | Author name. |
Edge | (a,b) in E(G) if a and b are authors on the same paper. |
Parameters : | Papers : list
edge_attribs : list |
---|---|
Returns : | author_institution_graph : networkx.MultiGraph
|
Generate an author_papers network NetworkX directed graph.
Element | Description |
---|---|
Node | Two kinds of nodes with distinguishing “type” attributes: * type = paper - a paper in papers * type = person - a person in papers Papers node attributes defined by paper_attribs. |
Edge | Directed, Author -> his/her Paper. |
Parameters : | papers : list
node_id : string
paper_attribs : list
|
---|---|
Returns : | author_papers_graph : networkx.DiGraph
|
Raises : | KeyError : Raised when node_id is not present in Papers. |
Generate a co-author network.
As the name suggests, edges are drawn between two author-vertices in the case that those authors published a paper together. Co-authorship networks are popular models for studying patterns of collaboration in scientific communities.
To generate a co-authorship network, use the networks.authors.coauthors() method:
Author institutional affiliation is included as a node attribute, if possible.
>>> CA = nt.authors.coauthors(papers)
>>> CA
<networkx.classes.graph.Graph object at 0x10d94cfd0>
Element | Description |
---|---|
Node | Author name. |
Edges | (a,b) in E(G) if a and b are coauthors on the same paper. |
Parameters : | papers : list
threshold : int
edge_attribs : list
node_attribs : list
geocode : bool
|
---|---|
Returns : | G : networkx.Graph
|
Generates an institutional network based on coauthorship.
An edge is drawn between two institutional vertices whenever two authors, one at each respective institution, coauthor a Paper.
>>> I = nt.authors.institutions(papers)
>>> I
<networkx.classes.graph.Graph object at 0x10d94cfd0>
Element | Description |
---|---|
Node | Institution name and location. |
Edges | (a,b) in E(G) if coauthors R and S are affiliated with institutions a and b, respectively. |
Parameters : | papers : list
threshold : int
edge_attribs : list
node_attribs : list
geocode : bool
|
---|---|
Returns : | G : networkx.Graph
|
Helper functions for generating networks.
citation_count(papers[, key, verbose]) | Generates citation counts for all of the papers cited by papers. |
simplify_multigraph(multigraph[, time]) | Simplifies a graph by condensing multiple edges between the same node pair into a single edge, with a weight attribute equal to the number of edges. |
top_cited(papers[, topn, verbose]) | Generates a list of the topn (or topn%) most cited papers. |
top_parents(papers[, topn, verbose]) | Returns a list of Paper that cite the topn most cited papers. |
Generates citation counts for all of the papers cited by papers.
Parameters : | papers : list
key : str
verbose : bool
|
---|---|
Returns : | counts : dict
|
Simplifies a graph by condensing multiple edges between the same node pair into a single edge, with a weight attribute equal to the number of edges.
Parameters : | graph : networkx.MultiGraph
time : bool
|
---|---|
Returns : | graph : networkx.Graph
|
Generates a list of the topn (or topn%) most cited papers.
Parameters : | papers : list
topn : int or float {0.-1.}
verbose : bool
|
---|---|
Returns : | top : list
counts : dict
|
Returns a list of Paper that cite the topn most cited papers.
Parameters : | papers : list
topn : int or float {0.-1.}
verbose : bool
|
---|---|
Returns : | papers : list
top : list
counts : dict
|
Methods for generating networks in which papers are vertices.
author_coupling(papers[, threshold, ...]) | Vertices are papers and edges indicates shared authorship. |
bibliographic_coupling(papers[, ...]) | Generate a bibliographic coupling network. |
cocitation(papers[, threshold, node_id, ...]) | Generate a cocitation network. |
direct_citation(papers[, node_id, node_attribs]) | Create a traditional directed citation network. |
topic_coupling(papers[, threshold, node_id]) | Two papers are coupled if they both contain a shared topic above threshold. |
Vertices are papers and edges indicates shared authorship.
Element | Description |
---|---|
Node | Papers, represented by node_id. |
Edge | (a,b) in E(G) if a and b share x authors and x >= threshold |
Edge Attributes | overlap: the value of x (above). |
Parameters : | papers : list
threshold : int
node_id : string
node_attribs : list
|
---|---|
Returns : | acoupling : networkx.Graph
|
Generate a bibliographic coupling network.
Two papers are bibliographically coupled when they both cite the same, third, paper. You can generate a bibliographic coupling network using the networks.papers.bibliographic_coupling() method.
>>> BC = nt.papers.bibliographic_coupling(papers)
>>> BC
<networkx.classes.graph.Graph object at 0x102eec710>
Especially when working with large datasets, or disciplinarily narrow literatures, it is usually helpful to set a minimum number of shared citations required for two papers to be coupled. You can do this by setting the `threshold` parameter.
>>> BC = nt.papers.bibliographic_coupling(papers, threshold=1)
>>> len(BC.edges())
1216
>>> BC = nt.papers.bibliographic_coupling(papers, threshold=2)
>>> len(BC.edges())
542
Element | Description |
---|---|
Node | Papers represented by node_id. |
Node Attributes | node_attribs in Paper |
Edge | (a,b) in E(G) if a and b share x citations where x >= threshold. |
Edge Attributes | overlap: the number of citations shared |
Parameters : | papers : list
citation_id: string :
threshold : int
node_id : string
node_attribs : list
weighted : bool
|
---|---|
Returns : | bcoupling : networkx.Graph
|
Raises : | KeyError : Raised when citation_id is not present in the meta_list. |
Notes
Lists cannot be attributes? causing errors for both gexf and graphml also nodes cannot be none.
Generate a cocitation network.
A cocitation network is a network in which vertices are papers, and edges indicate that two papers were cited by the same third paper. CiteSpace is a popular desktop application for co-citation analysis, and you can read about the theory behind it here. Co-citation analysis is generally performed with a temporal component, so building a GraphCollection from a :class`.DataCollection` sliced by date is recommended.
You can generate a co-citation network using the networks.papers.cocitation() method:
>>> CC = nt.papers.cocitation(papers)
>>> CC
<networkx.classes.graph.Graph object at 0x102eec790>
For large datasets, you may wish to set a minimum number of co-citations required for an edge between two papers Keep in mind that all of the references in a single paper are co-cited once, so a threshold of at least 2 is prudent. Note the dramatic decrease in the number of edges when the threshold is changed from 2 to 3.
>>> CC = nt.papers.cocitation(papers, threshold=2)
>>> len(CC.edges())
8889
>>> CC = nt.papers.cocitation(papers, threshold=3)
>>> len(CC.edges())
1493
Element | Description |
---|---|
Node | Cited papers represented by Paper ayjid. |
Edge | (a, b) if a and b are cited by the same paper. |
Edge Attributes | weight: number of times two papers are co-cited together. |
Parameters : | papers : list
threshold : int
topn : int or float, or None
verbose : bool
|
---|---|
Returns : | cocitation : networkx.Graph
|
Create a traditional directed citation network.
Direct-citation graphs are directed acyclic graphs in which vertices are papers, and each (directed) edge represents a citation of the target paper by the source paper. The networks.papers.direct_citation() method generates both a global citation graph, which includes all cited and citing papers, and an internal citation graph that describes only citations among papers in the original dataset.
To generate direct-citation graphs, use the networks.papers.direct_citation() method. Note the size difference between the global and internal citation graphs.
>>> gDC, iDC = nt.papers.direct_citation(papers)
>>> len(gDC)
5998
>>> len(iDC)
163
Element | Description |
---|---|
Node | Papers, represented by node_id. |
Edge | From a paper to a cited reference. |
Edge Attribute | Publication date of the citing paper. |
Parameters : | papers : list
node_id : int
node_attribs : list
|
---|---|
Returns : | citation_network : networkx.DiGraph
citation_network_internal : networkx.DiGraph
|
Raises : | KeyError : If node_id is not present in the meta_list. |
Two papers are coupled if they both contain a shared topic above threshold.
Element | Description |
---|---|
Node | Papers, represented by node_id. |
Edge | (a,b) in E(G) if a and b share >= 1 topics with proportion >= threshold in both a and b. |
Edge Attributes | weight: combined mean proportion of each shared topic. topics: list of shared topics. |
Parameters : | papers : list
threshold : float
node_id : string
|
---|---|
Returns : | tc : networkx.Graph
|
Methods for building networks from terms in bibliographic records. This includes keywords, abstract terms, etc.
keyword_cooccurrence(papers, threshold[, ...]) | Generates a keyword cooccurrence network. |
topic_coupling(model[, threshold]) | Creates a network of words connected by implication in a common topic(s). |
Generates a keyword cooccurrence network.
Parameters : | papers : list
threshold : int
connected : bool
|
---|---|
Returns : | k_coccurrence : networkx.Graph
|
Creates a network of words connected by implication in a common topic(s).
Parameters : | model : LDAModel threshold : float
|
---|---|
Returns : | tc : networkx.Graph
|