Kitchen.i18n Module

I18N is an important piece of any modern program. Unfortunately, setting up i18n in your program is often a confusing process. The functions provided here aim to make the programming side of that a little easier.

Most projects will be able to do something like this when they startup:

# myprogram/__init__.py:

import os
import sys

from kitchen.i18n import easy_gettext_setup

_, N_  = easy_gettext_setup('myprogram', localedirs=(
        os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
        os.path.join(sys.prefix, 'lib', 'locale')
        ))

Then, in other files that have strings that need translating:

# myprogram/commands.py:

from myprogram import _, N_

def print_usage():
    print _(u"""available commands are:
    --help              Display help
    --version           Display version of this program
    --bake-me-a-cake    as fast as you can
        """)

def print_invitations(age):
    print _('Please come to my party.')
    print N_('I will be turning %(age)s year old',
        'I will be turning %(age)s years old', age) % {'age': age}

See the documentation of easy_gettext_setup() and get_translation_object() for more details.

See also

gettext
for details of how the python gettext facilities work
babel
The babel module for in depth information on gettext, message catalogs, and translating your app. babel provides some nice features for i18n on top of gettext

Functions

easy_gettext_setup() should satisfy the needs of most users. get_translation_object() is designed to ease the way for anyone that needs more control.

kitchen.i18n.easy_gettext_setup(domain, localedirs=(), use_unicode=True)

Setup translation functions for an application

Parameters:
  • domain – Name of the message domain. This should be a unique name that can be used to lookup the message catalog for this app.
  • localedirs – Iterator of directories to look for message catalogs under. The first directory to exist is used regardless of whether messages for this domain are present. If none of the directories exist, fallback on sys.prefix + /share/locale Default: No directories to search so we just use the fallback.
  • use_unicode – If True return the gettext functions for unicode strings else return the functions for byte str for the translations. Default is True.
Returns:

tuple of the gettext function and gettext function for plurals

Setting up gettext can be a little tricky because of lack of documentation. This function will setup gettext using the Class-based API for you. For the simple case, you can use the default arguments and call it like this:

_, N_ = easy_gettext_setup()

This will get you two functions, _() and N_() that you can use to mark strings in your code for translation. _() is used to mark strings that don’t need to worry about plural forms no matter what the value of the variable is. N_() is used to mark strings that do need to have a different form if a variable in the string is plural.

See also

Kitchen.i18n Module
This module’s documentation has examples of using _() and N_()
get_translation_object()
for information on how to use localedirs to get the proper message catalogs both when in development and when installed to FHS compliant directories on Linux.

Note

The gettext functions returned from this function should be superior to the ones returned from gettext. The traits that make them better are described in the DummyTranslations and NewGNUTranslations documentation.

Changed in version kitchen-0.2.4: ; API kitchen.i18n 2.0.0 Changed easy_gettext_setup() to return the lgettext functions instead of gettext functions when use_unicode=False.

kitchen.i18n.get_translation_object(domain, localedirs=(), languages=None, class_=None, fallback=True, codeset=None, python2_api=True)

Get a translation object bound to the message catalogs

Parameters:
  • domain – Name of the message domain. This should be a unique name that can be used to lookup the message catalog for this app or library.
  • localedirs – Iterator of directories to look for message catalogs under. The directories are searched in order for message catalogs. For each of the directories searched, we check for message catalogs in any language specified in:attr:languages. The message catalogs are used to create the Translation object that we return. The Translation object will attempt to lookup the msgid in the first catalog that we found. If it’s not in there, it will go through each subsequent catalog looking for a match. For this reason, the order in which you specify the localedirs may be important. If no message catalogs are found, either return a DummyTranslations object or raise an IOError depending on the value of fallback. Rhe default localedir from gettext which is os.path.join(sys.prefix, 'share', 'locale') on Unix is implicitly appended to the localedirs, making it the last directory searched.
  • languages

    Iterator of language codes to check for message catalogs. If unspecified, the user’s locale settings will be used.

    See also

    gettext.find() for information on what environment variables are used.

  • class – The class to use to extract translations from the message catalogs. Defaults to NewGNUTranslations.
  • fallback – If set to data:False, raise an IOError if no message catalogs are found. If True, the default, return a DummyTranslations object.
  • codeset – Set the character encoding to use when returning byte str objects. This is equivalent to calling output_charset() on the Translations object that is returned from this function.
  • python2_api – When data:True (default), return Translation objects that use the python2 gettext api (gettext() and lgettext() return byte str. ugettext() exists and returns unicode strings). When False, return Translation objects that use the python3 gettext api (gettext returns unicode strings and lgettext returns byte str. ugettext does not exist.)
Returns:

Translation object to get gettext methods from

If you need more flexibility than easy_gettext_setup(), use this function. It sets up a gettext Translation object and returns it to you. Then you can access any of the methods of the object that you need directly. For instance, if you specifically need to access lgettext():

translations = get_translation_object('foo')
translations.lgettext('My Message')

This function is similar to the python standard library gettext.translation() but makes it better in two ways

  1. It returns NewGNUTranslations or DummyTranslations

    objects by default. These are superior to the gettext.GNUTranslations and gettext.NullTranslations objects because they are consistent in the string type they return and they fix several issues that can causethe python standard library objects to throw UnicodeError.

  2. This function takes multiple directories to search for

    message catalogs.

The latter is important when setting up gettext in a portable manner. There is not a common directory for translations across operating systems so one needs to look in multiple directories for the translations. get_translation_object() is able to handle that if you give it a list of directories to search for catalogs:

translations = get_translation_object('foo', localedirs=(
     os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
     os.path.join(sys.prefix, 'lib', 'locale')))

This will search for several different directories:

  1. A directory named locale in the same directory as the module that called get_translation_object(),
  2. In /usr/lib/locale
  3. In /usr/share/locale (the fallback directory)

This allows gettext to work on Windows and in development (where the message catalogs are typically in the toplevel module directory) and also when installed under Linux (where the message catalogs are installed in /usr/share/locale). You (or the system packager) just need to install the message catalogs in /usr/share/locale and remove the locale directory from the module to make this work. ie:

In development:
    ~/foo   # Toplevel module directory
    ~/foo/__init__.py
    ~/foo/locale    # With message catalogs below here:
    ~/foo/locale/es/LC_MESSAGES/foo.mo

Installed on Linux:
    /usr/lib/python2.7/site-packages/foo
    /usr/lib/python2.7/site-packages/foo/__init__.py
    /usr/share/locale/  # With message catalogs below here:
    /usr/share/locale/es/LC_MESSAGES/foo.mo

Note

This function will setup Translation objects that attempt to lookup msgids in all of the found message catalogs. This means if you have several versions of the message catalogs installed in different directories that the function searches, you need to make sure that localedirs specifies the directories so that newer message catalogs are searched first. It also means that if a newer catalog does not contain a translation for a msgid but an older one that’s in localedirs does, the translation from that older catalog will be returned.

Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Add more parameters to get_translation_object() so it can more easily be used as a replacement for gettext.translation(). Also change the way we use localedirs. We cycle through them until we find a suitable locale file rather than simply cycling through until we find a directory that exists. The new code is based heavily on the python standard library gettext.translation() function.

Changed in version kitchen-1.2.0: ; API kitchen.i18n 2.2.0 Add python2_api parameter

Translation Objects

The standard translation objects from the gettext module suffer from several problems:

  • They can throw UnicodeError
  • They can’t find translations for non-ASCII byte str messages
  • They may return either unicode string or byte str from the same function even though the functions say they will only return unicode or only return byte str.

DummyTranslations and NewGNUTranslations were written to fix these issues.

class kitchen.i18n.DummyTranslations(fp=None, python2_api=True)

Safer version of gettext.NullTranslations

This Translations class doesn’t translate the strings and is intended to be used as a fallback when there were errors setting up a real Translations object. It’s safer than gettext.NullTranslations in its handling of byte str vs unicode strings.

Unlike NullTranslations, this Translation class will never throw a UnicodeError. The code that you have around a call to DummyTranslations might throw a UnicodeError but at least that will be in code you control and can fix. Also, unlike NullTranslations all of this Translation object’s methods guarantee to return byte str except for ugettext() and ungettext() which guarantee to return unicode strings.

When byte str are returned, the strings will be encoded according to this algorithm:

  1. If a fallback has been added, the fallback will be called first. You’ll need to consult the fallback to see whether it performs any encoding changes.
  2. If a byte str was given, the same byte str will be returned.
  3. If a unicode string was given and set_output_charset() has been called then we encode the string using the output_charset
  4. If a unicode string was given and this is gettext() or ngettext() and _charset was set output in that charset.
  5. If a unicode string was given and this is gettext() or ngettext() we encode it using ‘utf-8’.
  6. If a unicode string was given and this is lgettext() or lngettext() we encode using the value of locale.getpreferredencoding()

For ugettext() and ungettext(), we go through the same set of steps with the following differences:

  • We transform byte str into unicode strings for these methods.
  • The encoding used to decode the byte str is taken from input_charset if it’s set, otherwise we decode using UTF-8.
input_charset

is an extension to the python standard library gettext that specifies what charset a message is encoded in when decoding a message to unicode. This is used for two purposes:

  1. If the message string is a byte str, this is used to decode the string to a unicode string before looking it up in the message catalog.
  2. In ugettext() and ungettext() methods, if a byte str is given as the message and is untranslated this is used as the encoding when decoding to unicode. This is different from _charset which may be set when a message catalog is loaded because input_charset is used to describe an encoding used in a python source file while _charset describes the encoding used in the message catalog file.

Any characters that aren’t able to be transformed from a byte str to unicode string or vice versa will be replaced with a replacement character (ie: u'�' in unicode based encodings, '?' in other ASCII compatible encodings).

See also

gettext.NullTranslations
For information about what methods are available and what they do.

Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 * Although we had adapted gettext(), ngettext(), lgettext(), and lngettext() to always return byte str, we hadn’t forced those byte str to always be in a specified charset. We now make sure that gettext() and ngettext() return byte str encoded using output_charset if set, otherwise charset and if neither of those, UTF-8. With lgettext() and lngettext() output_charset if set, otherwise locale.getpreferredencoding(). * Make setting input_charset and output_charset also set those attributes on any fallback translation objects.

Changed in version kitchen-1.2.0: ; API kitchen.i18n 2.2.0 Add python2_api parameter to __init__()

set_output_charset(charset)

Set the output charset

This serves two purposes. The normal gettext.NullTranslations.set_output_charset() does not set the output on fallback objects. On python-2.3, gettext.NullTranslations objects don’t contain this method.

class kitchen.i18n.NewGNUTranslations(fp=None, python2_api=True)

Safer version of gettext.GNUTranslations

gettext.GNUTranslations suffers from two problems that this class fixes.

  1. gettext.GNUTranslations can throw a UnicodeError in gettext.GNUTranslations.ugettext() if the message being translated has non-ASCII characters and there is no translation for it.
  2. gettext.GNUTranslations can return byte str from gettext.GNUTranslations.ugettext() and unicode strings from the other gettext() methods if the message being translated is the wrong type

When byte str are returned, the strings will be encoded according to this algorithm:

  1. If a fallback has been added, the fallback will be called first. You’ll need to consult the fallback to see whether it performs any encoding changes.
  2. If a byte str was given, the same byte str will be returned.
  3. If a unicode string was given and set_output_charset() has been called then we encode the string using the output_charset
  4. If a unicode string was given and this is gettext() or ngettext() and a charset was detected when parsing the message catalog, output in that charset.
  5. If a unicode string was given and this is gettext() or ngettext() we encode it using UTF-8.
  6. If a unicode string was given and this is lgettext() or lngettext() we encode using the value of locale.getpreferredencoding()

For ugettext() and ungettext(), we go through the same set of steps with the following differences:

  • We transform byte str into unicode strings for these methods.
  • The encoding used to decode the byte str is taken from input_charset if it’s set, otherwise we decode using UTF-8
input_charset

an extension to the python standard library gettext that specifies what charset a message is encoded in when decoding a message to unicode. This is used for two purposes:

  1. If the message string is a byte str, this is used to decode the string to a unicode string before looking it up in the message catalog.
  2. In ugettext() and ungettext() methods, if a byte str is given as the message and is untranslated his is used as the encoding when decoding to unicode. This is different from the _charset parameter that may be set when a message catalog is loaded because input_charset is used to describe an encoding used in a python source file while _charset describes the encoding used in the message catalog file.

Any characters that aren’t able to be transformed from a byte str to unicode string or vice versa will be replaced with a replacement character (ie: u'�' in unicode based encodings, '?' in other ASCII compatible encodings).

See also

gettext.GNUTranslations.gettext
For information about what methods this class has and what they do

Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Although we had adapted gettext(), ngettext(), lgettext(), and lngettext() to always return byte str, we hadn’t forced those byte str to always be in a specified charset. We now make sure that gettext() and ngettext() return byte str encoded using output_charset if set, otherwise charset and if neither of those, UTF-8. With lgettext() and lngettext() output_charset if set, otherwise locale.getpreferredencoding().

Table Of Contents

Previous topic

Kitchen API

Next topic

Kitchen.text: unicode and utf8 and xml oh my!

This Page