Using the Slugger library

Installation

Releases of Slugger should be installed from PyPi, using pip:

$ pip install slugger

You cannot use Slugger straight from a checkout of the github repository, as glibc-localedata has to be parsed and pickled first. When installing a release from , this data is already included.

See Development for details on how to generate this data.

Slugging things

Use is usually straightforward:

from slugger import Slugger

s = Slugger('de', hanlang='ja')
print s.sluggify(u'Hellö & Wörld 漢字')

This will print helloe-und-woerld-kan-ji. The Slugger class itself supports a number of construction options, to fine-tune the result.

You should not rely on Slugger generating the same slug across different versions, as the goal of this library is to steadily improve, either through better underlying libraries or fixes in Slugger itself. It is therefore necessary to store the generated slug in addition to the title if you keep a database of those.

Filenames

After a slug has been generated, all remaining invalid characters left over are filtered out using a regular expression that is URL and filename safe (this behavior can be altered using the invalid_pattern parameter of the Slugger constructor). This makes it also convenient for sanitizing filenames:

fn = os.path.join('base/dir/userx', s.sluggify(user_supplied_filename))

API reference

class slugger.Slugger(lang, chain=None, hanlang=None, lowercase=True, maxlength=100, invalid_pattern='[^A-Za-z0-9-]+', invalid_replacement='-')

Creates a new configuration for slugging.

Creating a new slugger is somewhat expensive, as the character translation tables need to be loaded. Reuse as much as possible.

Parameters:
  • lang – The language to use. Valid examples: 'de', 'de_DE', 'jp_JA'. If a language is not found Slugger falls back to English.
  • chain – Chain of replacement operations. If None, use language default.
  • hanlang – The language to use for transcribing chinese characters. Used on mixed-language e.g. English text with Kanji mixed in to force Japanese instead of Chinese reading.
  • lowercase – Convert slug to lowercase.
  • maxlength – Maximum length. Tries to keep words intact if possible.
  • invalid_pattern – Regular expression used to find invalid characters which are leftover after processing.
  • invalid_replacement – The replacement for invalid chracters.
sluggify(title)

Turn title into a slug.

This will run the whole chain and return the end result of the transformation.