Co-citation Analysis
====================

Co-citation analysis gained popularity in the 1970s as a technique for "mapping"
scientific literatures, and for finding latent semantic relationships among technical
publications. 

Two papers are co-cited if they are both cited by the same, third, paper. The standard
approach to co-citation analysis is to generate a sample of bibliographic records from a
particular field by using certain keywords or journal names, and then build a co-citation
graph describing relationships among their cited references. Thus the majority of papers
that are represented as nodes in the co-citation graph are **not** papers that responded
to the selection criteria used to build the dataset.

Before you begin, be sure to install the latest version of Tethne. Consult the
:ref:`installation` guide for details.

**If you run into problems**, don't panic. Tethne is under active development, and there
are certainly bugs to be found. Please report any problems on our 
`GitHub issue tracker <https://github.com/diging/tethne/issues?state=open>`_.

Getting Started
---------------

Before you start, you should choose an output folder where TethneGUI should store graphs 
and descriptions of your dataset.

You should also choose a dataset ID. This is a unique ID that Tethne will use to keep
track of your data between workflow steps.

Initialize TethneGUI
````````````````````

When you first start TethneGUI, you should see a window like the one shown below. Click
``Select folder...`` to specify your output folder. A dataset ID should be automatically 
generated for you; you can change this if you wish.

.. image:: _static/images/tutorial/install.3.png
   :width: 500
   :align: center

Once you've selected an output folder and a dataset ID, click the ``Run Tethne...`` 
button.

Reading WoS Data
----------------
You can read WoS data from one or multiple field-tagged data files.

Command-line
````````````
Use ``-I examplID`` to specify your dataset ID, and 
``-O /Users/erickpeirson/exampleOutput`` to specify your output folder.

``--data-format=WOS`` tells Tethne that your data are in the Web of Science field-tagged
format.

.. code-block:: bash

   $ tethne -I exampleID -O /Users/erickpeirson/exampleOutput --read-file \ 
   --data-path=/Users/erickpeirson/Downloads/tests/savedrecs4.txt --data-format=WOS
   ----------------------------------------
   	   Workflow step: Read
   ----------------------------------------
   Reading WOS data from file /Users/erickpeirson/Downloads/tests/savedrecs4.txt...done.
   Read 500 papers in 2.67462515831 seconds. Accession: 0ff65dc3-b8f7-4bdc-a714-2d2a539f10a9.
   Generating a new DataCollection...done.
   Saving DataCollection to /tmp/exampleID_DataCollection.pickle...done.
   
TethneGUI
`````````
1. Select your WoS data file. If you have one data file, click the ``Select a File...``.
   If you have multiple data files in their own folder, click ``Select a Folder...``.
2. Select the ``WOS`` file format.
3. Click the ``Read files`` button.

Depending on the size of your dataset, this may take a minute or two. When TethneGUI is
done reading your data, you should see messages like those depicted in the image below.

.. image:: _static/images/tutorial/coauthors.1.png
   :width: 500
   :align: center   
   
If your data are read successfully, click ``Next >``.

Python
``````
First import the :mod:`tethne.readers` module, then use the :func:`.readers.wos.read`
method to create a list of :class:`.Paper` instances. You can use 
:func:`.readers.wos.from_dir` to import all of the WoS datafiles in a directory.

.. code-block:: python

	>>> # Parse data.
	>>> import tethne.readers as rd
	>>> papers = rd.wos.read("/Path/To/FirstDataSet.txt")
	
Then create a new :class:`.DataCollection` to organize your data.

.. code-block:: python

   >>> from tethne.data import DataCollection
   >>> D = DataCollection(papers)

Slicing WoS Data
----------------
In this tutorial, we will build a dynamic co-citation network using a sliding time-window.
Whereas time-period slicing divides data into subsets by sequential non-overlapping time
periods, subsets generated by time-window slicing can overlap.

.. figure:: _static/images/bibliocoupling/timeline.timeslice.png
   :width: 400
   :align: center
   
   **Time-period** slicing, with a window-size of 4 years.
   
.. figure:: _static/images/bibliocoupling/timeline.timewindow.png
   :width: 400
   :align: center
   
   **Time-window** slicing, with a window-size of 4 years and a step-size of 1 year.

We use a sliding time-window for two reasons:

    1. To "smooth" the evolution of the network. There is lag-time between a conceptual or
       epistemic innovation and the publication of documents that represent that event. 
    2. To maintain cohesion between slices.

Command-line
````````````

.. code-block:: bash

   $ tethne -I exampleID -O /Users/erickpeirson/exampleOutput --slice -S date \
   > -M time_window --window-size=2
   ----------------------------------------
	   Workflow step: Slice
   ----------------------------------------
   Loading DataCollection from /tmp/exampleID_DataCollection.pickle...done.
   Slicing DataCollection by date...done.
   Saving slice distribution to /Users/erickpeirson/exampleOutput/exampleID_sliceDistribution.csv...done.
   Saving sliced DataCollection to /tmp/exampleID_DataCollection_sliced.pickle...done.

TethneGUI
`````````
1. The slice axis should be set to ``date`` by default. If not, select it from the
   ``Slice axis`` drop-down menu. 
2. Set ``Cumulative slicing`` to ``False``.
3. Select ``time_window`` from the ``Slice method`` menu. 
4. Set the ``Slice window size`` to ``2``. 
5. Click ``Slice files``. 

After a few minutes, slicing should be complete; click ``Next >``.

.. image:: _static/images/tutorial/slice.png
   :width: 500
   :align: center

Python
``````
Use the :func:`tethne.data.DataCollection.slice` method to slice your data. 

.. code-block:: python

   >>> D.slice('date', 'time_window', window_size=2)