Tethne is a package for analyzing citation data from the Web of Science. Modules within Tethne can generate a variety of networks, such as bibliographic coupling, citation, author-paper, and co-author networks, using networkx.
tethne.analyze | The tethne.analyze sub-package provides additional analysis methods not |
tethne.builders | Classes for building a GraphCollection . |
tethne.data | Classes for handling bibliographic data. |
tethne.matrices | Methods for generating matrices from Paper objects and other data. |
tethne.networks | Methods for building networks from bibliographic data. |
tethne.readers | Methods for parsing bibliographic datasets. |
tethne.services | Modules for interacting with external web services. |
tethne.utilities | Helper functions for tethne.networks . |
tethne.writers | Export networks to structured and unstructured formats, for visualization. |
Provides the Tethne command-line interface.
See Quickstart (Command-line) and Command-line Options for an introduction to the CLI.
Classes for building a GraphCollection .
builder(D) | Base class for builders. |
authorCollectionBuilder(D) | Builds a GraphCollection with method in |
paperCollectionBuilder(D) | Builds a GraphCollection with method in |
Bases: tethne.builders.builder
Builds a GraphCollection with method in tethne.networks.authors from a DataCollection .
Methods
build(graph_axis, graph_type, **kwargs) | Generates graphs for each slice along graph_axis in |
Generates graphs for each slice along graph_axis in DataCollection D.
Other axes in D are treated as attributes.
Usage
>>> import tethne.readers as rd
>>> data = rd.wos.read("/Path/to/wos/data.txt")
>>> from tethne.data import DataCollection
>>> D = DataCollection(data) # Indexed by wosid, by default.
>>> D.slice('date', 'time_window', window_size=4)
>>> from tethne.builders import authorCollectionBuilder
>>> builder = authorCollectionBuilder(D)
>>> C = builder.build('date', 'coauthors')
>>> C
<tethne.data.GraphCollection at 0x104ed3550>
Bases: tethne.builders.builder
Builds a GraphCollection with method in tethne.networks.papers from a DataCollection .
Methods
build(graph_axis, graph_type, **kwargs) | Generates graphs for each slice along graph_axis in |
Generates graphs for each slice along graph_axis in DataCollection D.
Other axes in D are treated as attributes.
Usage
>>> import tethne.readers as rd
>>> data = rd.wos.read("/Path/to/wos/data.txt")
>>> from tethne.data import DataCollection
>>> D = DataCollection(data) # Indexed by wosid, by default.
>>> D.slice('date', 'time_window', window_size=4)
>>> from tethne.builders import paperCollectionBuilder
>>> builder = paperCollectionBuilder(D)
>>> C = builder.build('date', 'bibliographic_coupling', threshold=2)
>>> C
<tethne.data.GraphCollection at 0x104ed3550>
Classes for handling bibliographic data.
Paper() | Base class for Papers. |
DataCollection(data[, index_by]) | A DataCollection organizes Papers for analysis. |
GraphCollection() | Collection of NetworkX nx.classes.graph.Graph objects, |
LDAModel(doc_topic, top_word, top_keys, ...) | Organizes parsed output from MALLET’s LDA modeling algorithm. |
Bases: object
A DataCollection organizes Papers for analysis.
The DataCollection is initialized with some data, which is indexed by a key in Paper (default is wosid). The DataCollection can then be sliced ( DataCollection.slice() ) by other keys in Paper .
Usage
>>> import tethne.readers as rd
>>> data = rd.wos.read("/Path/to/wos/data.txt")
>>> data += rd.wos.read("/Path/to/wos/data2.txt") # Two accessions.
>>> from tethne.data import DataCollection
>>> D = DataCollection(data) # Indexed by wosid, by default.
>>> D.slice('date', 'time_window', window_size=4)
>>> D.slice('accession')
>>> D
<tethne.data.DataCollection at 0x10af0ef50>
Methods
N_axes() | Returns the number of slice axes for this DataCollection . |
distribution() | Returns a Numpy array describing the number of Paper |
distribution_2d(x_axis, y_axis) | Returns a Numpy array describing the number of Paper |
get_axes() | Returns a list of all slice axes for this DataCollection . |
get_by(key_indices[, papers]) | Given a set of (key, index) tuples, return the corresponding subset of |
get_slice(key, index[, papers]) | Yields a specific slice. |
get_slices(key[, papers]) | Yields slices for key. |
indices() | Yields a list of indices of all papers in this DataCollection |
papers() | Yield the complete set of Paper instances in this |
slice(key[, method]) | Slices data by key, using method (if applicable). |
Returns the number of slice axes for this DataCollection .
Returns a Numpy array describing the number of Paper associated with each slice-coordinate.
WARNING: expensive for a DataCollection with many axes or long axes. Consider using distribution_2d() .
Returns : | dist : Numpy array
|
---|---|
Raises : | RuntimeError : DataCollection has not been sliced. |
Returns a Numpy array describing the number of Paper associated with each slice-coordinate, for x and y axes spcified.
Returns : | dist : Numpy array
|
---|---|
Raises : | RuntimeError : DataCollection has not been sliced. KeyError: Invalid slice axes for this DataCollection. : |
Returns a list of all slice axes for this DataCollection .
Given a set of (key, index) tuples, return the corresponding subset of Paper indices (or Paper instances themselves, if papers is True).
Parameters : | key_indices : list
|
---|---|
Returns : | plist : list
|
Raises : | RuntimeError : DataCollection has not been sliced. |
Yields a specific slice.
Parameters : | key : str
index : str or int
|
---|---|
Returns : | slice : list
|
Raises : | RuntimeError : DataCollection has not been sliced. KeyError : Data has not been sliced by [key] KeyError : [index] not a valid index for [key] |
Yields slices for key.
Parameters : | key : str
|
---|---|
Returns : | slices : dict
|
Raises : | RuntimeError : DataCollection has not been sliced. KeyError : Data has not been sliced by [key] |
Yields a list of indices of all papers in this DataCollection
Returns : | list :
|
---|
Yield the complete set of Paper instances in this DataCollection .
Returns : | papers : list
|
---|
Slices data by key, using method (if applicable).
Methods available for slicing a DataCollection:
Method | Description | Key | kwargs |
---|---|---|---|
time_window | Slices data using a sliding time-window. Dataslices are indexed by the start of the time-window. | date | window_size step_size |
time_period | Slices data into time periods of equal length. Dataslices are indexed by the start of the time period. | date | window_size |
The main difference between the sliding time-window (time_window) and the time-period (time_period) slicing methods are whether the resulting periods can overlap. Whereas time-period slicing divides data into subsets by sequential non-overlapping time periods, subsets generated by time-window slicing can overlap.
Avilable kwargs:
Argument | Type | Description |
---|---|---|
window_size | int | Size of time-window or period, in years (default = 1). |
step_size | int | Amount to advance time-window or period in each step (ignored for time_period). |
cumulative | bool | If True, the data from each successive slice includes the data from all preceding slices. Only applies if key is ‘date’ (default = False). |
Parameters : | key : str
method : str (optional)
kwargs : kwargs
|
---|
Bases: object
Collection of NetworkX nx.classes.graph.Graph objects, organized by some index (e.g. time).
A GraphCollection can be generated using classes in the tethne.builders module. See Generating a GraphCollection from a DataCollection for details.
Methods
compose() | Returns the simple union of all Graph in the |
edges([overwrite]) | Return complete set of edges for this GraphCollection . |
load(filepath) | Loads a pickled (serialized) GraphCollection from filepath. |
nodes([overwrite]) | Return complete set of nodes for this GraphCollection . |
save(filepath) | Pickles (serializes) the GraphCollection . |
Returns the simple union of all Graph in the GraphCollection .
Returns : | composed : Graph
|
---|
Notes
Node or edge attributes that vary over slices should be ignored.
Return complete set of edges for this GraphCollection .
If this method has been called previously for this GraphCollection then will not recompute unless overwrite = True.
Parameters : | overwrite : bool
|
---|---|
Returns : | edges : list
|
Loads a pickled (serialized) GraphCollection from filepath.
Parameters : | filepath : string
|
---|---|
Raises : | UnpicklingError : Raised when there is some issue in unpickling. IOError : File does not exist, or cannot be read. |
Return complete set of nodes for this GraphCollection .
If this method has been called previously for this GraphCollection then will not recompute unless overwrite = True.
Parameters : | overwrite : bool
|
---|---|
Returns : | nodes : list
|
Pickles (serializes) the GraphCollection .
Parameters : | filepath : :
|
---|---|
Raises : | PicklingError : Raised when unpicklable objects are Pickled. IOError : File does not exist, or cannot be opened. |
Bases: object
Organizes parsed output from MALLET’s LDA modeling algorithm.
Used by readers.mallet.
Methods
docs_in_topic(z[, topD]) | Returns a list of the topD documents most representative of topic z. |
topics_in_doc(d[, topZ]) | Returns a list of the topZ most prominent topics in a document. |
Returns a list of the topD documents most representative of topic z.
Parameters : | z : int
topD : int or float
|
---|---|
Returns : | documents : list
|
Returns a list of the topZ most prominent topics in a document.
Parameters : | d : str or int
topZ : int or float
|
---|---|
Returns : | topics : list
|
Bases: object
Base class for Papers.
Behaves just like a dict, but enforces a limited vocabulary of keys, and specific data types.
The following fields (and corresponding data types) are allowed:
Field | Type | Description |
---|---|---|
aulast | list | Authors’ last name, as a list. |
auinit | list | Authors’ first initial as a list. |
institution | dict | Institutions with which the authors are affiliated. |
atitle | str | Article title. |
jtitle | str | Journal title or abbreviated title. |
volume | str | Journal volume number. |
issue | str | Journal issue number. |
spage | str | Starting page of article in journal. |
epage | str | Ending page of article in journal. |
date | int | Article date of publication. |
country | dict | Author-Country mapping. |
citations | list | A list of Paper instances. |
ayjid | str | First author’s name (last fi), pubdate, and journal. |
doi | str | Digital Object Identifier. |
pmid | str | PubMed ID. |
wosid | str | Web of Science UT fieldtag value. |
accession | str | Identifier for data conversion accession. |
None values are also allowed for all fields.
Methods
authors() | Returns a list of author names (FI LAST). |
iteritems() | Returns an iterator for the Paper‘s metadata fields |
keys() | Returns the keys of the Paper‘s metadata fields. |
values() | Returns the values of the Paper‘s metadata fields. |
Returns a list of author names (FI LAST).
Methods for network analysis.
Analyzes the global closeness centrality of a node over time.
Parameters : | papers : list
node : any
window_size : int
normalize : bool
|
---|---|
Returns : | trajectory : dict
|