Package genshi :: Module util

Module util

Various utility classes and functions.
Classes
  LRUCache
A dictionary-like object that stores only a certain number of items, and discards its least recently used item when full.
Functions
 
flatten(items)
Flattens a potentially nested sequence into a flat list.
 
plaintext(text, keeplinebreaks=True)
Return the text with all entities and tags removed.
 
stripentities(text, keepxmlentities=False)
Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.
 
striptags(text)
Return a copy of the text with any XML/HTML tags removed.
 
stringrepr(string)
Variables
  __package__ = 'genshi'
Function Details

flatten(items)

 

Flattens a potentially nested sequence into a flat list.

>>> flatten((1, 2))
[1, 2]
>>> flatten([1, (2, 3), 4])
[1, 2, 3, 4]
>>> flatten([1, (2, [3, 4]), 5])
[1, 2, 3, 4, 5]
Parameters:
  • items - the sequence to flatten

plaintext(text, keeplinebreaks=True)

 

Return the text with all entities and tags removed.

>>> plaintext('<b>1 &lt; 2</b>')
u'1 < 2'

The keeplinebreaks parameter can be set to False to replace any line breaks by simple spaces:

>>> plaintext('''<b>1
... &lt;
... 2</b>''', keeplinebreaks=False)
u'1 < 2'
Parameters:
  • text - the text to convert to plain text
  • keeplinebreaks - whether line breaks in the text should be kept intact
Returns:
the text with tags and entities removed

stripentities(text, keepxmlentities=False)

 

Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.

>>> stripentities('1 &lt; 2')
u'1 < 2'
>>> stripentities('more &hellip;')
u'more \u2026'
>>> stripentities('&#8230;')
u'\u2026'
>>> stripentities('&#x2026;')
u'\u2026'

If the keepxmlentities parameter is provided and is a truth value, the core XML entities (&amp;, &apos;, &gt;, &lt; and &quot;) are left intact.

>>> stripentities('1 &lt; 2 &hellip;', keepxmlentities=True)
u'1 &lt; 2 \u2026'

striptags(text)

 

Return a copy of the text with any XML/HTML tags removed.

>>> striptags('<span>Foo</span> bar')
'Foo bar'
>>> striptags('<span class="bar">Foo</span>')
'Foo'
>>> striptags('Foo<br />')
'Foo'

HTML/XML comments are stripped, too:

>>> striptags('<!-- <blub>hehe</blah> -->test')
'test'
Parameters:
  • text - the string to remove tags from
Returns:
the text with tags removed