Package pyxb :: Package utils :: Module unicode :: Class CodePointSet
Class CodePointSet

object --+

Represent a set of Unicode code points.

Each code point is an integral value between 0 and 0x10FFFF. This class is used to represent a set of code points in a manner suitable for use as regular expression character sets.

For testing purrposes only, access to the codepoints internal representation.
__cmp__(self, other)
Equality is delegated to the codepoints list.
__init__(self, *args)
x.__init__(...) initializes x; see help(type(x)) for signature
add(self, value)
Add the given value to the code point set.
extend(self, values)
Add multiple values to a code point set.
subtract(self, value)
Remove the given value from the code point set.
asPattern(self, with_brackets=True)
Return the code point set as Unicode regular expression character group consisting of a sequence of characters or character ranges.
Return the codepoints as tuples denoting the ranges that are in the set.
Return an instance that represents the inverse of this set.
If this set represents a single character, return it as its unicode string value.
  MaxShortCodePoint = 65535
  MaxCodePoint = 1114111
The maximum value for a code point in the Unicode code point space.
  __codepoints = None
  __XMLtoPythonREMap = {u'': u'\x00', u'-': u'\-', u'[': u'\[',...
__init__(self, *args)

x.__init__(...) initializes x; see help(type(x)) for signature

add(self, value)

Add the given value to the code point set.

  • value - An integral value denoting a code point, or a tuple (s,e) denoting the start and end (inclusive) code points in a range.

extend(self, values)

Add multiple values to a code point set.

  • values - Either a CodePointSet instance, or an iterable whose members are valid parameters to add.

subtract(self, value)

Remove the given value from the code point set.

  • value - An integral value denoting a code point, or a tuple (s,e) denoting the start and end (inclusive) code points in a range, or a CodePointSet.

asPattern(self, with_brackets=True)

Return the code point set as Unicode regular expression character group consisting of a sequence of characters or character ranges.

This returns a regular expression fragment using Python's regular expression syntax. Note that different regular expression syntaxes are not compatible, often in subtle ways.

  • with_brackets - If True (default), square brackets are added to enclose the returned character group.


Return the codepoints as tuples denoting the ranges that are in the set.

Each tuple (s, e) indicates that the code points from s (inclusive) to e) (inclusive) are in the set.


If this set represents a single character, return it as its unicode string value. Otherwise return None.

The maximum value for a code point in the Unicode code point space. This is normally 0xFFFF, because wide unicode characters are generally not enabled in Python builds. If, however, they are enabled, this will be the full value of 0x10FFFF.



{u'': u'\x00',
 u'-': u'\-',
 u'[': u'\[',
 u'\': u'\\',
 u']': u'\]',
 u'^': u'\^'}