enchant.utils: Misc utilities for the enchant package

This module provides miscellaneous utilities for use with the enchant spellchecking package. Currently available functionality includes:

  • string/unicode compatibility wrappers
  • functions for dealing with locale/language settings
  • ability to list supporting data files (win32 only)
  • functions for bundling supporting data files from a build
class enchant.utils.EnchantStr

String subclass for interfacing with enchant C library.

This class encapsulates the logic for interfacing between python native string/unicode objects and the underlying enchant library, which expects all strings to be UTF-8 character arrays. It is a subclass of the default string class ‘str’ - on Python 2.x that makes it an ascii string, on Python 3.x it is a unicode object.

Initialise it with a string or unicode object, and use the encode() method to obtain an object suitable for passing to the underlying C library. When strings are read back into python, use decode(s) to translate them back into the appropriate python-level string type.

This allows us to following the common Python 2.x idiom of returning unicode when unicode is passed in, and byte strings otherwise. It also lets the interface be upwards-compatible with Python 3, in which string objects are unicode by default.

decode(value)

Decode a string returned by the enchant C library.

encode()

Encode this string into a form usable by the enchant C library.

enchant.utils.get_default_language(default=None)

Determine the user’s default language, if possible.

This function uses the ‘locale’ module to try to determine the user’s preferred language. The return value is as follows:

  • if a locale is available for the LC_MESSAGES category, that language is used
  • if a default locale is available, that language is used
  • if the keyword argument <default> is given, it is used
  • if nothing else works, None is returned

Note that determining the user’s language is in general only possible if they have set the necessary environment variables on their system.

enchant.utils.get_resource_filename(resname)

Get the absolute path to the named resource file.

This serves widely the same purpose as pkg_resources.resource_filename(), but tries to avoid loading pkg_resources unless we’re actually in an egg.

enchant.utils.levenshtein(s1, s2)

Calculate the Levenshtein distance between two strings.

This is straight from Wikipedia.

enchant.utils.printf(values, sep=' ', end='\n', file=None)

Compatability wrapper from print statement/function.

This function is a simple Python2/Python3 compatability wrapper for printing to stdout.

enchant.utils.raw_bytes(raw)

Make a bytes object out of a raw string.

This is analogous to raw_unicode, but processes byte escape characters to produce a bytes object.

enchant.utils.raw_unicode(raw)

Make a unicode string from a raw string.

This function takes a string containing unicode escape characters, and returns the corresponding unicode string. Useful for writing unicode string literals in your python source while being upwards- compatible with Python 3. For example, instead of doing this:

s = u”hellou2149” # syntax error in Python 3

Or this:

s = “hellou2149” # not what you want in Python 2.x

You can do this:

s = raw_unicode(r”hellou2149”) # works everywhere!
enchant.utils.trim_suggestions(word, suggs, maxlen, calcdist=None)

Trim a list of suggestions to a maximum length.

If the list of suggested words is too long, you can use this function to trim it down to a maximum length. It tries to keep the “best” suggestions based on similarity to the original word.

If the optional “calcdist” argument is provided, it must be a callable taking two words and returning the distance between them. It will be used to determine which words to retain in the list. The default is a simple Levenshtein distance.

enchant.utils.win32_data_files()

Get list of supporting data files, for use with setup.py

This function returns a list of the supporting data files available to the running version of PyEnchant. This is in the format expected by the data_files argument of the distutils setup function. It’s very useful, for example, for including the data files in an executable produced by py2exe.

Only really tested on the win32 platform (it’s the only platform for which we ship our own supporting data files)