Functions related to displaying unicode text. Unicode characters don’t all have the same width so we need helper functions for displaying them.
New in version 0.2: kitchen.display API 1.0.0
Get the textual width of a string
Parameters: |
|
||||
---|---|---|---|---|---|
Raises ControlCharError: | |||||
if msg contains a control character and control_chars is strict. |
|||||
Returns: | Textual width of the msg. This is the amount of space that the string will consume on a monospace display. It’s measured in the number of cell positions or columns it will take up on a monospace display. This is not the number of glyphs that are in the string. |
Note
This function can be wrong sometimes because Unicode does not specify a strict width value for all of the code points. In particular, we’ve found that some Tamil characters take up to four character cells but we return a lesser amount.
Given a string, return it chopped to a given textual width
Parameters: |
|
---|---|
Return type: | unicode string |
Returns: | unicode string of the msg chopped at the given textual width |
This is what you want to use instead of %.*s, as it does the “right” thing with regard to UTF-8 sequences, control characters, and characters that take more than one cell position. Eg:
>>> # Wrong: only displays 8 characters because it is operating on bytes
>>> print "%.*s" % (10, 'café ñunru!')
café ñun
>>> # Properly operates on graphemes
>>> '%s' % (textual_width_chop('café ñunru!', 10))
café ñunru
>>> # takes too many columns because the kanji need two cell positions
>>> print '1234567890\n%.*s' % (10, u'一二三四五六七八九十')
1234567890
一二三四五六七八九十
>>> # Properly chops at 10 columns
>>> print '1234567890\n%s' % (textual_width_chop(u'一二三四五六七八九十', 10))
1234567890
一二三四五
Expand a unicode string to a specified textual width or chop to same
Parameters: |
|
---|---|
Return type: | unicode string |
Returns: | msg formatted to fill the specified width. If no chop is specified, the string could exceed the fill length when completed. If prefix or suffix are printable characters, the string could be longer than the fill width. |
Note
prefix and suffix should be used for “invisible” characters like highlighting, color changing escape codes, etc. The fill characters are appended outside of any prefix or suffix elements. This allows you to only highlight msg inside of the field you’re filling.
Warning
msg, prefix, and suffix should all be representable as unicode characters. In particular, any escape sequences in prefix and suffix need to be convertible to unicode. If you need to use byte sequences here rather than unicode characters, use byte_string_textual_width_fill() instead.
This function expands a string to fill a field of a particular textual width. Use it instead of %*.*s, as it does the “right” thing with regard to UTF-8 sequences, control characters, and characters that take more than one cell position in a display. Example usage:
>>> msg = u'一二三四五六七八九十'
>>> # Wrong: This uses 10 characters instead of 10 cells:
>>> u":%-*.*s:" % (10, 10, msg[:9])
:一二三四五六七八九 :
>>> # This uses 10 cells like we really want:
>>> u":%s:" % (textual_width_fill(msg[:9], 10, 10))
:一二三四五:
>>> # Wrong: Right aligned in the field, but too many cells
>>> u"%20.10s" % (msg)
一二三四五六七八九十
>>> # Correct: Right aligned with proper number of cells
>>> u"%s" % (textual_width_fill(msg, 20, 10, left=False))
一二三四五
>>> # Wrong: Adding some escape characters to highlight the line but too many cells
>>> u"%s%20.10s%s" % (prefix, msg, suffix)
u'[7m 一二三四五六七八九十[0m'
>>> # Correct highlight of the line
>>> u"%s%s%s" % (prefix, display.textual_width_fill(msg, 20, 10, left=False), suffix)
u'[7m 一二三四五[0m'
>>> # Correct way to not highlight the fill
>>> u"%s" % (display.textual_width_fill(msg, 20, 10, left=False, prefix=prefix, suffix=suffix))
u' [7m一二三四五[0m'
Works like we want textwrap.wrap() to work,
Parameters: |
|
---|---|
Return type: | list of unicode strings |
Returns: | list of lines that have been text wrapped and indented. |
textwrap.wrap() from the python standard library has two drawbacks that this attempts to fix:
Works like we want textwrap.fill() to work
Parameters: | text – unicode string or byte str to process |
---|---|
Returns: | unicode string with each line separated by a newline |
See also
This function is a light wrapper around kitchen.text.display.wrap(). Where that function returns a list of lines, this function returns one string with each line separated by a newline.
Expand a byte str to a specified textual width or chop to same
Parameters: |
|
---|---|
Return type: | byte str |
Returns: | msg formatted to fill the specified textual width. If no chop is specified, the string could exceed the fill length when completed. If prefix or suffix are printable characters, the string could be longer than fill width. |
Note
prefix and suffix should be used for “invisible” characters like highlighting, color changing escape codes, etc. The fill characters are appended outside of any prefix or suffix elements. This allows you to only highlight msg inside of the field you’re filling.
See also
For example usage. This function has only two differences.
There are a few internal functions and variables in this module. Code outside of kitchen shouldn’t use them but people coding on kitchen itself may find them useful.
Internal table, provided by this module to list code points which combine with other characters and therefore should have no textual width. This is a sorted tuple of non-overlapping intervals. Each interval is a tuple listing a starting code point and ending code point. Every code point between the two end points is a combining character.
See also
This table was last regenerated on python-3.2.3 with unicodedata.unidata_version 6.0.0
Combine Markus Kuhn’s data with unicodedata to make combining char list
Return type: | tuple of tuples |
---|---|
Returns: | tuple of intervals of code points that are combining character. Each interval is a 2-tuple of the starting code point and the ending code point for the combining characters. |
In normal use, this function serves to tell how we’re generating the combining char list. For speed reasons, we use this to generate a static list and just use that later.
Markus Kuhn’s list of combining characters is more complete than what’s in the python unicodedata library but the python unicodedata is synced against later versions of the unicode database
This is used to generate the _COMBINING table.
Print out a new _COMBINING table
This will print a new _COMBINING table in the format used in kitchen/text/display.py. It’s useful for updating the _COMBINING table with updated data from a new python as the format won’t change from what’s already in the file.
Binary search in an interval table.
Parameters: |
|
---|---|
Returns: | If value is found within an interval in the table return True. Otherwise, False |
This function checks whether a numeric value is present within a table of intervals. It checks using a binary search algorithm, dividing the list of values in half and checking against the values until it determines whether the value is in the table.
Get the textual width of a ucs character
Parameters: |
|
||||
---|---|---|---|---|---|
Raises ControlCharError: | |||||
if the code point is a unicode control character and control_chars is set to ‘strict’ |
|||||
Returns: | textual width of the character. |
Note
It’s important to remember this is textual width and not the number of characters or bytes.
Optimize the common case when deciding which textual width is larger
Parameters: |
|
---|---|
Returns: | True if the total length of args are less than or equal to width. Otherwise False. |
We often want to know “does X fit in Y”. It takes a while to use textual_width() to calculate this. However, we know that the number of canonically composed unicode characters is always going to have 1 or 2 for the textual width per character. With this we can take the following shortcuts:
textual width of a canonically composed unicode string will always be greater than or equal to the the number of unicode characters. So we can first check if the number of composed unicode characters is less than the asked for width. If it is we can return True immediately. If not, then we must do a full textual width lookup.