travis-ci badge coveralls.io badge

plotextractor

Small library for converting and mapping plots in TeX source used in scholarly communication.

Originally part of Invenio https://github.com/inveniosoftware/invenio.

Installation

pip install plotextractor

Usage

from plotextractor import process_tarball
plots = process_tarball("/path/to/tarball.tar.gz")
print(plots[0])
{
    'url': '/path/to/tarball.tar.gz_files/d15-120f3d.png',
    'captions': ['The $\\rho^0$ meson properties: (a) Mass ...']
    'name': 'd15-120f3d',
    'label': 'fig:mass'
}

Known issues

If you experience frequent DelegateError errors you may need to update your version of GhostScript.

API

Plotextractor API.

plotextractor.process_tarball(tarball, output_directory=None, context=False)

Process one tarball end-to-end.

If output directory is given, the tarball will be extracted there. Otherwise, it will extract it in a folder next to the tarball file.

The function returns a list of dictionaries:

[{
    'url': '/path/to/tarball_files/d15-120f3d.png',
    'captions': ['The $\rho^0$ meson properties: (a) Mass ...'],
    'name': 'd15-120f3d',
    'label': 'fig:mass'
}, ... ]
Param:tarball (string): the absolute location of the tarball we wish to process
Param:output_directory (string): path of file processing and extraction (optional)
Param:context: if True, also try to extract context where images are referenced in the text. (optional)
Returns:images(list): list of dictionaries for each image with captions.

Changes

Version 0.1.6 (2016-12-01)

  • Sets the mtime for all members of the tarball to current time before unpacking.

Version 0.1.5 (2016-05-25)

  • Properly raises an exception when no TeX files are found in an archive.
  • More fixes to image path extraction and more robust image handling.

Version 0.1.4 (2016-03-22)

  • Fixes linking images from TeX reference when images are referred to without specifying full relative folder path.

Version 0.1.3 (2016-03-17)

  • Properly supports cases where images are located in a nested folder inside the extracted tarballs root folder.

Version 0.1.2 (2015-12-08)

  • Adds wrapfigure support.
  • Catches problems with image conversions.
  • More robust handling of image rotations in TeX sources.
  • Removes unicode_literals usage.

Version 0.1.1 (2015-12-04)

  • Improves extraction from TeX file by reading files with encoding.

Version 0.1.0 (2015-10-19)

Contributing

Bug reports, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the code of this library, please:

  1. Search for already reported problems.
  2. Check if the issue has been fixed or is still reproducible on the latest master branch.
  3. Create an issue with a test case.

If you create a feature branch, you can run the tests to ensure everything is operating correctly:

$ ./run-tests.sh

Authors