Tutorial

The basic principle is to work with bibliographic database objects. They store entries, which can be accessed in a dict like manner. The fields of each entry in turn can also accessed in a dict like manner.

There are different BibDB implementations for different storage back ends like FileBibDB or DoiBibDB.

For a very simple workflow the package has some helper functions.

Reading a database file

Lets assume we have the following BibTeX sample file bibtex.bib:

% This file was created with JabRef 2.10b2.
% Encoding: UTF8

@Article{JCP-127-234509,
  Title                    = {Homogeneous nucleation and growth in supersaturated zinc vapor investigated by molecular dynamics simulation},
  Author                   = {F. R\"{o}mer and T. Kraska},
  Journal                  = {Journal of Chemical Physics},
  Year                     = {2007},
  Number                   = {23},
  Pages                    = {234509},
  Volume                   = {127},
  Doi                      = {10.1063/1.2805063},
}

To work with a file based database we use a FileBibDB object like this:

>>> from biblib import FileBibDB

>>> db = FileBibDB('bibtex.bib)

Now we have access on the database level by the methods provided by the BibDB class, and to the individual bibliographic entries by the Entry methods.

Note

By default a FileBibDB object (and other StorageBibDB implementations) is writable and all changes you make to the db will be written back. If you want to access it in read only mode creat it with the parameter mode=”r” like this:

>>> db = FileBibDB('bibtex.bib', mode='r')

Accessing the Database

The easiest way to access the database is to access it like a dictionary.

get, set, delete

You can access entries like you would with a dictionary using the cite key of the specific entry:

# get an entry
someEntry = db['JCP-127-234509']

# set an entry
db['MyCiteKey'] = someEntry

# add it with the more flexible method 'add_entry'
#db.add_entry(someEntry, 'MyCiteKey')

# or delete an entry from the db
del db['JCP-127-234509']

Iterarting

# iterate over cite keys
for cityKey in db:
    entry = db[cityKey]
    ...

# or just the entries
for entry in db.values():
    ...

# or even both
for cityKey, entry in db.items():
    ...

# iterate over dois of entries in the db
for doi, citeKey in db.dois:
    ...

contains

To check if an entry exists in the db you can do:
# check if a city key exist
if 'JCP-127-234509' in db:
    ...

# or an entry:
if entry in db.values():
    ...

#or even check if an entry with an given doi exists
if db.has_doi('10.1209/0295-5075/109/68001'):
    ...

Creating an Entry

Beside from reading an Entry from some database, you can create one manually like this:

from bibdb import Entry

# First we setup a dictionary containing the initial data, ..
inputdict = {
    'ENTRYTYPE': 'mastersthesis',
    'ID': 'Mayer2008',
    'author': u'Hans. H. Mayer',
    'school': u'University of Nowhere'
    'title': u'A very interesting thesis.',
    'year': u'2008'
}

# and than create the entry object.
entry = Entry.get_Instance(inputdict)

# Now we can add the new entry to the database:
db.add_entry(newEntry)

Note

  • The dictonary key ‘ID’ is mandetory.
  • The dictonary key ‘ENTRYTYPE’ is mandatory and must conform to BibTeX entry types.
  • For mandetory fields for the used ‘ENTRYTYPE’ check the Entrytype classes

Note

All field values (except ‘ENTRYTYPE’ and ‘ID’) must be unicode strings! So in Python 2 prefix literal string with ‘u’ or convert them to unicode!

Accessing an Entry

Entries can also accesst like dictionaries.

Take a look at the list of existing (allowed) field names.

get, set, delete a field

title = entry['title']

entry['title'] = "New Title"

del entry['title']

iterating

# iterate over keys
for key in entry:
    ...

# iterate over values
for value in entry.values():
    ...

# iterarte over both
for key, value in entry.values():
    ...

Note

While iteratingover an entry the fields ‘ENTRYTYPE’ and ‘ID’ are excluded.

To get a dictionary, which can be used as an inputDict for a new Entry use the datadict property.

contains

# check if a field exists  in the entry
if 'title' in entry:
    ...

checks

The entry object has the properties missingTags and is_complete to check if all mandetory tags are set.

# check if all mandetory tags are set:
if entry.is_complete:
    ...

# get all mandetory tags that are missing
for missingTag in entry.missingTags:
    ...

Authors

>>> # A list of dictionaries for the author(s).
>>> entry.authors
[{'given': u'Hans H.', 'family': u'Mayer'}]

The property Entry.authors contains only a reasonable contents, if the author tag contents is in a valid BibTeX format. This might be not every time the case, even if you retrieve citation data from publishers websites e.g., so be careful!

Citation Key

Also you should note, that the citation-key stored in an entry object (Entry.ckey) does not have to be equal with its referring citation-key in the database, it keeps always its initial value as shown here:

>>> # As you remember, we have changed the citation-key of the JCP article above,
>>> # but when we call the object property, we get...
>>> print db.get_entry('Roemer2007').ckey
JCP-127-234509

Modify a database entry

Actually there are two modifications of a bibliographic entry which have to made on the database level: updating the citation-key (BibDB.update_ckey()) and changing the BibTeX entry type (BibDB.mod_entry_type()):

>>> # lets promote H. H. Mayer to PhD ;) ,...
>>> dbObj.mod_entry_type('Mayer2008','phdthesis')

>>> # and change the citation-key of the JCP article.
>>> dbObj.update_ckey('JCP-127-234509','Roemer2007')

>>> # This we get..
>>> print dbObj.datadict
{'Mayer2008': <biblib._entry.Phdthesis object at 0x7f2545788bd0>, 'Roemer2007': <biblib._entry.Article object at 0x7f25456d64d0>}

Of course, when you intend to change the type of an entry, you need to ensure that the new entry type is valid. At the end of this tutorial you find a list of valid BibTeX entry types.

Get citation by DOI

Nearly very modern publication has its Digital Object Identifier (DOI). The International DOI Foundation (IDF) offers by a webservice not only to retrieve the URL for a respective publications, but also bibliographic meta data. To use this service in a comfortable manner within this library two helper functions are implemented: entry_from_doi() to retrieve a entry object for a single citation, and db_from_doiList() to retrieve a database based on a list od DOIs. The following example will show you how easy you can catch a BibTeX entry by its DOI:

>>> # We have a DOI, like
>>> doi = '10.1088/0959-5309/43/5/301'
>>> # and retrieve now a new entry object ...
>>> doiEntry = biblib.entry_from_doi(doi)
>>> # we've got..
>>> print biblib.entry_to_string(doiEntry)
@article{Lennard_Jones_1931,
    author = {J E Lennard-Jones},
    doi = {10.1088/0959-5309/43/5/301},
    journal = {Proc. Phys. Soc.},
    month = {sep},
    number = {5},
    pages = {461-482},
    publisher = {{IOP}Publishing},
    title = {Cohesion},
    url = {http://dx.doi.org/10.1088/0959-5309/43/5/301},
    volume = {43},
    year = {1931}
}

Get citation by ISBN

Another unique identifier is the International Standard Book Number (ISBN). Here we make use of the functionality of the isbnlib. Like for DOIs, we implemented here two functions to retrieve bibliographic data by the ISBN: entry_from_isbn() and db_from_isbnList(). Have a look at the example:

>>> # We have a ISBN
>>> isbn = '978-0486647418'
>>> # and retrieve now a new entry object ...
>>> isbnEntry = biblib.entry_from_isbn(isbn)
>>> # we've got..
>>> print biblib.entry_to_string(isbnEntry)
@book{9780486647418,
    author = {Sybren Ruurds de Groot and Peter Mazur},
    publisher = {Dover Publications},
    title = {Non-Equilibrium Thermodynamics},
    year = {1984}
}

Merging databases

Todo

obviously it needs to be finished...

Citation-key template

In order to offer a comfortable way to get a suitable citation-key for a new entry object, the database object provides the method BibDB.proposeCKey(), which proposes a BibTeX citation-key for a given entry object in the context of the database. Therefore, one needs to define templates, which describe how a citation-key should be build up. These templates are stored as strings, and consist of keywords enclosed in curly braces, like e.g. {family}{year}. Here the keywords refer to the BibTeX tag names, whose content of the respective entry object will be replaced. Additionally to the BibTeX tag names, you can use:

  • {family} and {given}, which refer to the name of the first author (using Entry.authors), as well as
  • {cnt} to define the position of a counter element.

The database holds three attributes concerning the citation-key template:

  • BibDB.ckey_tpl template for a citation-key without a counter (default: {family}{year}),
  • BibDB.ckey_tpl_wc template for a citation-key with a counter (default: {family}{year}{cnt}), and
  • BibDB.ckey_tpl_cnt keyword for counter style for the citation-key template (default: alpha).

The counter will be introduced, if citation-key without counter would collide because it already exists in the database. Currently three different counter styles are implemented:

  • alpha: a,b,c,.. ,z
  • Alpha: A,B,C,.. ,Z
  • num: 1,2,3,...

Note

If you use {family} and/or {given}, ensure that the author tag contents is in a valid BibTeX name format!

Adding/merging method

When adding an entry to a database (BibDB.add_entry()) or when merging one database into another (BibDB.merge_bibdb()), an important question is: What should happen with the citation-key?

A citation-key of an entry object may be invalid or may collide with an existing entry in the database. In order to provide a robust scheme to add entries to a database, with respect to the citation-key, all methods which are involved provide the option method. By default it is set to None, which means an invalid or conflicting citation-key will raise a KeyError. If method is:

  • 'lazy': First try to use the given citation-key.

    If it is already in use or invalid, generate a new using the template.

  • 'auto': Always use the template to generate a proper citation-key.

  • 'force': Use the given citation-key. If it is already in used, the old entry object will be replaced.

    If it is invalid, generate a new using the template.

Example

This example is a command line script to manipulate a bibtex bib file. As you can see it shows getting, adding and deleting entries from the given bibtex bib file.

#!/usr/bin/env python

"""
Usage:
    example.py get -f <file> (-c <cite_key> | -d <doi> | -i <isbn>)
    example.py add -f <file> (-b <bibtex> | -d <doi> | -i <isbn> )
    example.py del -f <file> <cite_key>
    example.py list -f <file> (-c | -d)
"""

from __future__ import print_function
from biblib import FileBibDB, DoiBibDB, IsbnBibDB, StringBibDB, entry_to_string


def get(fileName, cite_key=None, doi=None, isbn=None):
    """
        get an entry for given cite_key, doi or isbn

        :raises KeyError: if citey_key, doi org isbn is not found
    """
    if cite_key:
        db = FileBibDB(fileName)
        return db[cite_key]
    elif doi:
        doiDb = DoiBibDB()
        return doiDb[doi]
    elif isbn:
        isbnDb = IsbnBibDB()
        return isbnDb[isbn]

def add(fileName, bibStr=None, doi=None, isbn=None):
    """add entry from given bibStr, doi or isbn to given bibtex file"""

    db = FileBibDB(fileName)
    if doi or isbn:
        # if doi or isbn get entry and add it to db file
        entry = get(fileName, None, doi, isbn)
        db.add_entry(entry, method='auto')
    elif bibStr:
        # if it is a bibtex string, parse it and add all entries to db file
        tmpDb = StringBibDB(bibStr)
        db.merge_bibdb(tmpDb, method='auto')

def rm(fileName, cite_key):
    """
        delete entry given by cite_key from given bibtex file

        :raises KeyError: if citey_key is not found
    """
    db = FileBibDB(fileName)
    del db[cite_key]


def listing(fileName, cite_keys=False, dois=False):
    """list city_keys or dois (if tag is set) of all entries from given bibtex file"""

    db = FileBibDB(fileName)
    if cite_keys:
        # list all cite_keys
        return list(db.keys())
    elif dois:
        # list dois of enties whos dois tag is set
        return list(db.dois.keys())


if __name__ == '__main__':

    import sys
    from docopt import docopt

    arguments = docopt(__doc__)

    fileName = arguments['<file>']
    cite_key = arguments['<cite_key>']
    doi = arguments['<doi>']
    isbn = arguments['<isbn>']
    bibStr = arguments['<bibtex>']

    try:
        if arguments['get']:
            # get entry
            entry = get(fileName, cite_key, doi, isbn)
            print(entry_to_string(entry))
        elif arguments['add']:
            # add entry
            add(fileName, bibStr, doi, isbn)
        elif arguments['del']:
            # delete entry
            rm(fileName, cite_key)
        elif arguments['list']:
            if arguments['-c']:
                # list cite_keys
                data = listing(fileName, cite_keys=True)
            if arguments['-d']:
                # list dois
                data = listing(fileName, dois=True)
            print(' '.join(data))

    except KeyError:
        print('No entry found for given identifier.', file=sys.stderr)
        sys.exit(2)
    except Exception as e:
        print(e, file=sys.stderr)
        sys.exit(1)

    sys.exit(0)

Example with BibDBCollection

Warning

The BibDBCollection class is experimental and not fully tested!

This example does the same thing as the previous example by utilising the BibDBCollection, which in turn implements already most of the checking/lookup done manually in the previous example.

#!/usr/bin/env python

"""
Usage:
    example.py get -f <file> (-c <cite_key> | -d <doi> | -i <isbn>)
    example.py add -f <file> (-b <bibtex> | -d <doi> | -i <isbn> )
    example.py del -f <file> <cite_key>
    example.py list -f <file> (-c | -d)
"""

from __future__ import print_function
from biblib import FileBibDB, DoiBibDB, IsbnBibDB, StringBibDB, entry_to_string, BibDBCollection


if __name__ == '__main__':

    import sys
    from docopt import docopt

    arguments = docopt(__doc__)

    bibStr = arguments['<bibtex>']

    id = arguments['<cite_key>'] or arguments['<doi>'] or arguments['<isbn>']

    try:

        # create BibDBCollection
        db = BibDBCollection()
        # add different DBs, first match wins on lookups
        db.addDB(FileBibDB(arguments['<file>']))
        db.addDB(DoiBibDB())
        db.addDB(IsbnBibDB())

        if arguments['get']:
            # get entry
            print(entry_to_string(db[id]))
        elif arguments['add']:
            # add entry
            if id:
                # by doi or isbn
                db.add_entry(db[id], method='auto')
            elif bibStr:
                # by bibStr
                db.merge_bibdb(StringBibDB(bibStr), method='auto')
        elif arguments['del']:
            # delete entry
            del db[id]
        elif arguments['list']:
            if arguments['-c']:
                # list cite_keys
                data = db.keys()
            if arguments['-d']:
                # list cite_keys
                data = db.dois.keys()
            print(' '.join(data))

    except KeyError:
        print('No entry found for given identifier.', file=sys.stderr)
        sys.exit(2)
    except Exception as e:
        print(e, file=sys.stderr)
        sys.exit(1)

    sys.exit(0)