networks Package

networks Package

Methods for building networks from bibliographic data.

Each network relies on certain meta data in the Paper associated with each document. Often we wish to construct a network with nodes representing these documents and edges representing relationships between those documents, but this is not always the case.

Where it is the case, it is recommended but not required that nodes are represented by an identifier from {ayjid, wosid, pmid, doi}. Each has certain benefits. If the documents to be networked come from a single database source such as the Web of Science, wosid is most appropriate. If not, using doi will result in a more accurate, but also more sparse network; while ayjid will result in a less accurate, but more complete network.

Any type of meta data from the Paper may be used as an identifier, however.

We use “head” and “tail” nomenclature to refer to the members of a directed edge (x,y), x -> y, xy, etc. by calling x the “tail” and y the “head”.

Modules

authors Methods for generating networks in which authors are vertices.
helpers Helper functions for generating networks.
papers Methods for generating networks in which papers are vertices.
terms Methods for building networks from terms in bibliographic records.
topics Build networks from topics in a topic model.

authors Module

Methods for generating networks in which authors are vertices.

Methods

author_cocitation(papers[, threshold]) Generates an author co-citation network; edges indicate co-citation of authors’ papers.
author_coinstitution(Papers[, threshold]) Generate a co-institution graph, where edges indicate shared affiliation.
author_institution(Papers[, edge_attribs]) Generate a bi-partite graph connecting authors and their institutions.
author_papers(papers[, node_id, paper_attribs]) Generate an author_papers network NetworkX directed graph.
coauthors(papers[, threshold, edge_attribs, ...]) Generate a co-author network.
tethne.networks.authors.author_cocitation(papers, threshold=1, **kwargs)[source]

Generates an author co-citation network; edges indicate co-citation of authors’ papers.

Similar to papers.cocitation(), except that vertices are authors rather than papers. To generate an author co-citation network, use the networks.authors.author_cocitation() method:

>>> ACC = nt.authors.author_cocitation(papers)
>>> ACC
<networkx.classes.graph.Graph object at 0x106571190>
Element Description
Nodes Author name.
Edge (a, b) if a and b are referenced by the same paper in papers
Edge attribute ‘weight’, the number of papers that co-cite a and b.
Parameters :

papers : list

a list of Paper objects.

threshold : int

Minimum number of co-citations required to create an edge between authors.

Returns :

cocitation : networkx.Graph

A cocitation network.

tethne.networks.authors.author_coinstitution(Papers, threshold=1, **kwargs)[source]

Generate a co-institution graph, where edges indicate shared affiliation.

Some bibliographic datasets, including data from the Web of Science, includes the institutional affiliations of authors. In a co-institution graph, two authors (vertices) have an edge between them if they share an institutional affiliation in the dataset. Note that data about institutional affiliations varies in the WoS database so this will yield more reliable results for more recent publications.

To generate a co-institution network, use the networks.authors.author_coinstitution() method:

>>> ACI = nt.authors.author_coinstitution(papers)
>>> ACI
<networkx.classes.graph.Graph object at 0x106571190>
Element Description
Node Authors.
Node Attribute type (string). ‘author’ or ‘institution’.
Edges (a, b) where a and b are affiliated with the same institution.
Edge attribute overlap (int). number of shared institutions.
Parameters :

Papers : list

A list of wos_objects.

threshold : int

Minimum institutional overlap required for an edge.

Returns :

coinstitution : NetworkX graph

A coinstitution network.

tethne.networks.authors.author_institution(Papers, edge_attribs=[], **kwargs)[source]

Generate a bi-partite graph connecting authors and their institutions.

This may be slightly ambiguous for WoS data where there is no explicit author-institution mapping. Edge weights are the number of co-associations between an author and an institution, which should help resolve this ambiguity (the more data the better).

Element Description
Node Author name.
Edge (a,b) in E(G) if a and b are authors on the same paper.
Parameters :

Papers : list

A list of Paper instances.

edge_attribs : list

List of edge_attributes specifying which Paper keys (from the authored paper) to use as edge attributes. For example, the ‘date’ key in Paper .

Returns :

author_institution_graph : networkx.MultiGraph

A graph describing institutional affiliations of authors in the corpus.

tethne.networks.authors.author_papers(papers, node_id='ayjid', paper_attribs=[], **kwargs)[source]

Generate an author_papers network NetworkX directed graph.

Element Description
Node Two kinds of nodes with distinguishing “type” attributes: * type = paper - a paper in papers * type = person - a person in papers Papers node attributes defined by paper_attribs.
Edge Directed, Author -> his/her Paper.
Parameters :

papers : list

A list of wos_objects.

node_id : string

A key from Paper used to identify the nodes.

paper_attribs : list

List of user-provided optional arguments apart from the provided positional arguments.

Returns :

author_papers_graph : networkx.DiGraph

A DiGraph ‘author_papers_graph’.

Raises :

KeyError : Raised when node_id is not present in Papers.

tethne.networks.authors.coauthors(papers, threshold=1, edge_attribs=['ayjid'], node_attribs=['institution'], geocode=False, **kwargs)[source]

Generate a co-author network.

As the name suggests, edges are drawn between two author-vertices in the case that those authors published a paper together. Co-authorship networks are popular models for studying patterns of collaboration in scientific communities.

To generate a co-authorship network, use the networks.authors.coauthors() method:

Author institutional affiliation is included as a node attribute, if possible.

>>> CA = nt.authors.coauthors(papers)
>>> CA
<networkx.classes.graph.Graph object at 0x10d94cfd0>
Element Description
Node Author name.
Edges (a,b) in E(G) if a and b are coauthors on the same paper.
Parameters :

papers : list

A list of Paper instances.

threshold : int

Minimum number of co-citations required for an edge. (default: 1)

edge_attribs : list

List of edge_attributes specifying which Paper keys (from the co-authored paper) to use as edge attributes. (default: [‘ayjid’])

node_attribs : list

List of attributes to attach to author nodes. Presently limited to ‘institution’.

geocode : bool

If True, attempts to geocode institutional information for authors, and adds latitude, longitude, and precision attributes to each node.

Returns :

G : networkx.Graph

A co-authorship network.

tethne.networks.authors.institutions(papers, threshold=1, edge_attrbs=['ayjid'], node_attribs=['authors'], geocode=False, **kwargs)

Generates an institutional network based on coauthorship.

An edge is drawn between two institutional vertices whenever two authors, one at each respective institution, coauthor a Paper.

>>> I = nt.authors.institutions(papers)
>>> I
<networkx.classes.graph.Graph object at 0x10d94cfd0>
Element Description
Node Institution name and location.
Edges (a,b) in E(G) if coauthors R and S are affiliated with institutions a and b, respectively.
Parameters :

papers : list

A list of Paper instances.

threshold : int

Minimum number of co-citations required for an edge. (default: 1)

edge_attribs : list

List of edge_attributes specifying which Paper keys (from the co-authored paper) to use as edge attributes. (default: [‘ayjid’])

node_attribs : list

List of attributes to attach to author nodes. Presently limited to ‘institution’.

geocode : bool

If True, attempts to geocode institutional information for authors, and adds latitude, longitude, and precision attributes to each node.

Returns :

G : networkx.Graph

An institutional co-authorship network.

helpers Module

Helper functions for generating networks.

citation_count(papers[, key, verbose]) Generates citation counts for all of the papers cited by papers.
simplify_multigraph(multigraph[, time]) Simplifies a graph by condensing multiple edges between the same node pair into a single edge, with a weight attribute equal to the number of edges.
top_cited(papers[, topn, verbose]) Generates a list of the topn (or topn%) most cited papers.
top_parents(papers[, topn, verbose]) Returns a list of Paper that cite the topn most cited papers.
tethne.networks.helpers.citation_count(papers, key='ayjid', verbose=False)[source]

Generates citation counts for all of the papers cited by papers.

Parameters :

papers : list

A list of Paper instances.

key : str

Property to use as node key. Default is ‘ayjid’ (recommended).

verbose : bool

If True, prints status messages.

Returns :

counts : dict

Citation counts for all papers cited by papers.

tethne.networks.helpers.simplify_multigraph(multigraph, time=False)[source]

Simplifies a graph by condensing multiple edges between the same node pair into a single edge, with a weight attribute equal to the number of edges.

Parameters :

graph : networkx.MultiGraph

E.g. a coauthorship graph.

time : bool

If True, will generate ‘start’ and ‘end’ attributes for each edge, corresponding to the earliest and latest ‘date’ values for that edge.

Returns :

graph : networkx.Graph

A NetworkX graph .

tethne.networks.helpers.top_cited(papers, topn=20, verbose=False)[source]

Generates a list of the topn (or topn%) most cited papers.

Parameters :

papers : list

A list of Paper instances.

topn : int or float {0.-1.}

Number (int) or percentage (float) of top-cited papers to return.

verbose : bool

If True, prints status messages.

Returns :

top : list

A list of ‘ayjid’ keys for the topn most cited papers.

counts : dict

Citation counts for all papers cited by papers.

tethne.networks.helpers.top_parents(papers, topn=20, verbose=False)[source]

Returns a list of Paper that cite the topn most cited papers.

Parameters :

papers : list

A list of Paper objects.

topn : int or float {0.-1.}

Number (int) or percentage (float) of top-cited papers.

verbose : bool

If True, prints status messages.

Returns :

papers : list

A list of Paper objects.

top : list

A list of ‘ayjid’ keys for the topn most cited papers.

counts : dict

Citation counts for all papers cited by papers.

papers Module

Methods for generating networks in which papers are vertices.

Methods

author_coupling(papers[, threshold, ...]) Vertices are papers and edges indicates shared authorship.
bibliographic_coupling(papers[, ...]) Generate a bibliographic coupling network.
cocitation(papers[, threshold, node_id, ...]) Generate a cocitation network.
direct_citation(papers[, node_id, node_attribs]) Create a traditional directed citation network.
topic_coupling(papers[, threshold, node_id]) Two papers are coupled if they both contain a shared topic above threshold.
tethne.networks.papers.author_coupling(papers, threshold=1, node_attribs=['date'], node_id='ayjid', **kwargs)[source]

Vertices are papers and edges indicates shared authorship.

Element Description
Node Papers, represented by node_id.
Edge (a,b) in E(G) if a and b share x authors and x >= threshold
Edge Attributes overlap: the value of x (above).
Parameters :

papers : list

A list of Paper

threshold : int

Minimum number of co-citations required to draw an edge between two authors.

node_id : string

Field in Paper used to identify nodes.

node_attribs : list

List of fields in Paper to include as node attributes in graph.

Returns :

acoupling : networkx.Graph

An author-coupling network.

tethne.networks.papers.bibliographic_coupling(papers, citation_id='ayjid', threshold=1, node_id='ayjid', node_attribs=['date'], weighted=False, **kwargs)[source]

Generate a bibliographic coupling network.

Two papers are bibliographically coupled when they both cite the same, third, paper. You can generate a bibliographic coupling network using the networks.papers.bibliographic_coupling() method.

>>> BC = nt.papers.bibliographic_coupling(papers)
>>> BC
<networkx.classes.graph.Graph object at 0x102eec710>

Especially when working with large datasets, or disciplinarily narrow literatures, it is usually helpful to set a minimum number of shared citations required for two papers to be coupled. You can do this by setting the `threshold` parameter.

>>> BC = nt.papers.bibliographic_coupling(papers, threshold=1)
>>> len(BC.edges())
1216
>>> BC = nt.papers.bibliographic_coupling(papers, threshold=2)
>>> len(BC.edges())
542
Element Description
Node Papers represented by node_id.
Node Attributes node_attribs in Paper
Edge (a,b) in E(G) if a and b share x citations where x >= threshold.
Edge Attributes overlap: the number of citations shared
Parameters :

papers : list

A list of wos_objects.

citation_id: string :

A key from Paper to identify the citation overlaps. Default is ‘ayjid’.

threshold : int

Minimum number of shared citations to consider two papers “coupled”.

node_id : string

Field in Paper used to identify the nodes. Default is ‘ayjid’.

node_attribs : list

List of fields in Paper to include as node attributes in graph.

weighted : bool

If True, edge attribute overlap is a float in {0-1} calculated as \(\cfrac{N_{ij}}{\sqrt{N_{i}N_{j}}}\) where \(N_{i}\) and \(N_{j}\) are the number of references in Paper i and j, respectively, and \(N_{ij}\) is the number of references shared by papers i and j.

Returns :

bcoupling : networkx.Graph

A bibliographic coupling network.

Raises :

KeyError : Raised when citation_id is not present in the meta_list.

Notes

Lists cannot be attributes? causing errors for both gexf and graphml also nodes cannot be none.

tethne.networks.papers.cocitation(papers, threshold=1, node_id='ayjid', topn=None, verbose=False, node_attribs=['date'], **kwargs)[source]

Generate a cocitation network.

A cocitation network is a network in which vertices are papers, and edges indicate that two papers were cited by the same third paper. CiteSpace is a popular desktop application for co-citation analysis, and you can read about the theory behind it here. Co-citation analysis is generally performed with a temporal component, so building a GraphCollection from a :class`.DataCollection` sliced by date is recommended.

You can generate a co-citation network using the networks.papers.cocitation() method:

>>> CC = nt.papers.cocitation(papers)
>>> CC
<networkx.classes.graph.Graph object at 0x102eec790>

For large datasets, you may wish to set a minimum number of co-citations required for an edge between two papers Keep in mind that all of the references in a single paper are co-cited once, so a threshold of at least 2 is prudent. Note the dramatic decrease in the number of edges when the threshold is changed from 2 to 3.

>>> CC = nt.papers.cocitation(papers, threshold=2)
>>> len(CC.edges())
8889
>>> CC = nt.papers.cocitation(papers, threshold=3)
>>> len(CC.edges())
1493
Element Description
Node Cited papers represented by Paper ayjid.
Edge (a, b) if a and b are cited by the same paper.
Edge Attributes weight: number of times two papers are co-cited together.
Parameters :

papers : list

a list of Paper objects.

threshold : int

Minimum number of co-citations required to create an edge.

topn : int or float, or None

If provided, only the topn (int) or topn percent (float) most cited papers will be included in the cocitation network. If None (default), network will include all cited papers (NOTE: this can cause severe memory consumption for even moderately-sized datasets).

verbose : bool

If True, prints status messages.

Returns :

cocitation : networkx.Graph

A cocitation network.

tethne.networks.papers.direct_citation(papers, node_id='ayjid', node_attribs=['date'], **kwargs)[source]

Create a traditional directed citation network.

Direct-citation graphs are directed acyclic graphs in which vertices are papers, and each (directed) edge represents a citation of the target paper by the source paper. The networks.papers.direct_citation() method generates both a global citation graph, which includes all cited and citing papers, and an internal citation graph that describes only citations among papers in the original dataset.

To generate direct-citation graphs, use the networks.papers.direct_citation() method. Note the size difference between the global and internal citation graphs.

>>> gDC, iDC = nt.papers.direct_citation(papers)
>>> len(gDC)
5998
>>> len(iDC)
163
Element Description
Node Papers, represented by node_id.
Edge From a paper to a cited reference.
Edge Attribute Publication date of the citing paper.
Parameters :

papers : list

A list of Paper instances.

node_id : int

A key from Paper to identify the nodes. Default is ‘ayjid’.

node_attribs : list

List of user provided optional arguments apart from the provided positional arguments.

Returns :

citation_network : networkx.DiGraph

Global citation network (all citations).

citation_network_internal : networkx.DiGraph

Internal citation network where only the papers in the list are nodes in the network.

Raises :

KeyError : If node_id is not present in the meta_list.

tethne.networks.papers.topic_coupling(papers, threshold=0.7, node_id='ayjid', **kwargs)[source]

Two papers are coupled if they both contain a shared topic above threshold.

Element Description
Node Papers, represented by node_id.
Edge (a,b) in E(G) if a and b share >= 1 topics with proportion >= threshold in both a and b.
Edge Attributes weight: combined mean proportion of each shared topic. topics: list of shared topics.
Parameters :

papers : list

A list of Paper

threshold : float

Minimum representation of a topic in each paper.

node_id : string

Field in Paper used to identify nodes.

Returns :

tc : networkx.Graph

A topic-coupling network.

terms Module

Methods for building networks from terms in bibliographic records. This includes keywords, abstract terms, etc.

keyword_cooccurrence(papers, threshold[, ...]) Generates a keyword cooccurrence network.
topic_coupling(model[, threshold]) Creates a network of words connected by implication in a common topic(s).
tethne.networks.terms.keyword_cooccurrence(papers, threshold, connected=False, **kwargs)[source]

Generates a keyword cooccurrence network.

Parameters :

papers : list

A list of Paper objects.

threshold : int

Minimum number of occurrences for a keyword pair to appear in graph.

connected : bool

If True, returns only the largest connected component.

Returns :

k_coccurrence : networkx.Graph

A keyword coccurrence network.

tethne.networks.terms.topic_coupling(model, threshold=0.005, **kwargs)[source]

Creates a network of words connected by implication in a common topic(s).

Parameters :

model : LDAModel

threshold : float

Minimum P(W|T) for coupling.

Returns :

tc : networkx.Graph

A topic-coupling graph, where nodes are terms.

topics Module

Build networks from topics in a topic model.

tethne.networks.topics.paper_coupling(model, threshold=0.1)[source]
tethne.networks.topics.term_coupling(model, threshold=0.01)[source]

Table Of Contents

Previous topic

matrices Package

Next topic

readers Package

This Page