Creating Networks from Bibliographic Data

Parsing Data

Methods for parsing bibliographic data are contained in the readers module. Tethne parses bibliographic data into a list of Paper objects that can then be used to generate networks.

Many (but not all) of the networks that Tethne can generate require citation data. The current version of Tethne only supports citation data from the Web of Science, which can be parsed using the readers.wos module. For example:

>>> import tethne.readers as rd
>>> papers = rd.wos.read("/Path/to/savedrecs.txt")

Tethne can also parse data from JSTOR’s Data-for-Research portal, using the readers.dfr module. Those data can be merged with a WoS dataset (see readers.merge()), or used on their own to generate coauthor networks, with networks.authors.coauthors().

>>> import tethne.readers as rd
>>> papers = rd.dfr.read("/Path/to/DfR")

Creating Networks

There are many different network models that can be used to describe bibliographic data. These can be roughly divided into two categories: networks that describe relationships among documents, and networks that describe relationships among the authors of those documents. For specific methods, see Networks of Documents and Networks of Authors.

All network-building methods can be found in the networks module. nt is the recommended namespace convention.

>>> import tethne.networks as nt

There are two main ways of using network-building methods:

Generating a single network directly from a list of Paper objects

All methods in tethne.networks take lists of Paper as arguments. For example:

>>> import tethne.readers as rd
>>> papers = rd.wos.read("/Path/to/savedrecs.txt")
>>> import tethne.networks as nt
>>> BC = nt.papers.bibliographic_coupling(papers, threshold=2)

Generating a GraphCollection from a DataCollection

This is useful in cases where you want to evaluate the evolution of network structure over time, or compare networks generated using subsets of your data.

To generate a time-variant GraphCollection, slice your DataCollection using the date field. In the example below, data are sliced using a 4-year sliding time-window (for details about slicing, see tethne.data.DataCollection.slice()).

>>> # Parse data.
>>> import tethne.readers as rd
>>> papers = rd.wos.read("/Path/To/FirstDataSet.txt")

>>> # Create a DataCollection, and slice it.
>>> from tethne.data import DataCollection, GraphCollection
>>> D = DataCollection(papers)
>>> D.slice('date', 'time_window', window_size=4)

>>> # Build a GraphCollection using a network from tethne.networks.
>>> from tethne.builders import authorCollectionBuilder
>>> builder = authorCollectionBuilder(D)
>>> C = builder.build('date', 'coauthors')

C.keys() should now yield a list of publication dates in the original dataset.

A DataCollection can be sliced using any int or str field in the Paper class. If you wish to compare networks generated from two WoS downloads, for example, you could slice using the accession id:

>>> # Parse data.
>>> import tethne.readers as rd
>>> papers = rd.wos.read("/Path/To/FirstDataSet.txt")
>>> papers += rd.wos.read("/Path/To/SecondDataSet.txt")
>>> # Create a DataCollection, and slice it.
>>> from tethne.data import DataCollection, GraphCollection
>>> D = DataCollection(papers)
>>> D.slice('accession')
>>> # Build a GraphCollection using a network from tethne.networks.
>>> from tethne.builders import authorCollectionBuilder
>>> builder = paperCollectionBuilder(D)
>>> C = builder.build('date', 'cocitation', threshold=2)

C.keys() should now yield two values, each an accession UUID.

Networks of Documents

Methods for building networks in which vertices represent documents are provided in the networks.papers module.

tethne.networks.papers.author_coupling(papers) Vertices are papers and edges indicates shared authorship.
tethne.networks.papers.bibliographic_coupling(papers) Generate a bibliographic coupling network.
tethne.networks.papers.cocitation(papers[, ...]) Generate a cocitation network.
tethne.networks.papers.direct_citation(papers) Create a traditional directed citation network.

Networks of Authors

Methods for building networks in which vertices represent authors are provided in the networks.authors module.

tethne.networks.authors.author_cocitation(papers) Generates an author co-citation network; edges indicate co-citation of authors’ papers.
tethne.networks.authors.author_coinstitution(Papers) Generate a co-institution graph, where edges indicate shared affiliation.
tethne.networks.authors.author_institution(Papers) Generate a bi-partite graph connecting authors and their institutions.
tethne.networks.authors.author_papers(papers) Generate an author_papers network NetworkX directed graph.
tethne.networks.authors.coauthors(papers[, ...]) Generate a co-author network.