Myghty Documentation

Version: 1.2 Last Updated: 07/07/10 12:55:17

View: Paged | One Page

Table of Contents

Previous: Advanced Resolver Configuration | Next: Index of Configuration Parameters

Unicode Support

What You Can Give to m.write() (and m.apply_escapes())

The Magic Encoding Comment

Controlling the Output Encoding

Other Details

Disabling Unicode Support

Since version 1.1, Myghty provides support for writing unicode strings, and for including non-ASCII characters within component source files.

What You Can Give to m.write() (and m.apply_escapes())

When unicode support is enabled, you may pass either unicode or plain strs to m.write(). Strs will be interpreted according the Python's system default encoding (as returned by sys.getdefaultencoding(). You may also write any other object, in which case the object will be coerced to unicode by calling unicode() before it is output. There is one exception to this rule: writing a None generates no output.

back to section top

The Magic Encoding Comment

If a myghty component source file contains contains characters other than those in the python system default encoding (as reported by sys.getdefaultencoding() --- usually ASCII), you may so indicate this by placing a magic encoding comment at the top of the file. The exact syntax of the magic comment is essentially the same as that used by python, with the added restriction that the '#' which introduces the magic comment must start at the beginning of a line (without leading whitespace.)

The magic encoding comment affects the interpretation of any plain text in the component source file, and the contents of any python unicode string literals. It does not have any effect on the interpretation of bytes within python plain str literals. In particular, the following is likely to generate a UnicodeDecodeError:

# encoding: latin1

# This is fine:
Français

% m.write(u"Français")  # This is fine, too

% m.write("Français")   # BAD! => UnicodeDecodeError

back to section top

Controlling the Output Encoding

The output encoding, and output encoding error handling strategy can be specified using the output_encoding and encoding_errors configuration parameters. It can also be changed for a specific request (or portion thereof) by calling the set_output_encoding method.

Choices for the value of encoding_errors include:

strict: Raise an exception in case of an encoding error.
replace: Replace malformed data with a suitable replacement marker, such as "?".
xmlcharrefreplace: Replace with the appropriate XML character reference.
htmlentityreplace: Replace with the appropriate HTML character entity reference, if there is one; otherwise replace with a numeric character reference. (This is not a standard python encoding error handler. It is provided by the mighty.escapes module.)
backslashreplace: Replace with backslashed escape sequence.
ignore: Ignore malformed data and continue without further notice.

See the Python codecs documentation for more information on how encoding error handlers work, and on how you can define your own.

Generally, for components generating HTML output, it sufficient to set output_encoding to 'latin1' (or even 'ascii'), and encoding_errors to 'htmlentityreplace'. (Latin1 is the default encoding for HTML, as specified in RFC 2616.) The 'htmlentityreplace' error handler replaces any characters which can't be encoded by an HTML named character reference (or a numeric character reference, if that is not possible) so this setting can correctly handle the output of any unicode character to HTML.

back to section top

Other Details

With unicode support enabled the return value from m.scomp() will be either a unicode or a str in the system default encoding.

Similarly, the input passed to any component output filters will also be either a unicode or a str. The filter may return any object which is coercable to a unicode.

Output passed to the .write() method of component capture buffers (specified using the store argument of execute_component) will be either a unicode or a plain str. (Using a StringIO.StringIO buffer should just work. Using a cStringIO.StringIO buffer will probably not work, as they don't accept unicode input.)

Output passed to the .write() method of subrequest capture buffers (specified using the out_buffer argument of create_subrequest) will be encoded strs. The encoding and error strategy, by default, will be the system default encoding and 'strict' respectively, irrespective of the output_encoding of the parent request. These can be changed using the output_encoding and encoding_errors arguments of create_subrequest (or by calling set_output_encoding on the subrequest.)

back to section top

Disabling Unicode Support

Myghty's unicode support may be disabled by setting the disable_unicode configuration parameter.

back to section top

Previous: Advanced Resolver Configuration | Next: Index of Configuration Parameters