Package pyxb :: Package utils :: Module unicode :: Class CodePointSet
[hide private]
[frames] | no frames]

Class CodePointSet

source code

object --+
         |
        CodePointSet

Represent a set of Unicode code points.

Each code point is an integral value between 0 and 0x10FFFF. This class is used to represent a set of code points in a manner suitable for use as regular expression character sets.

Instance Methods [hide private]
 
_codepoints(self)
For testing purrposes only, access to the codepoints internal representation.
source code
 
__cmp__(self, other)
Equality is delegated to the codepoints list.
source code
 
__init__(self, *args)
x.__init__(...) initializes x; see help(type(x)) for signature
source code
 
__mutate(self, value, do_add) source code
 
add(self, value)
Add the given value to the code point set.
source code
 
extend(self, values)
Add multiple values to a code point set.
source code
 
subtract(self, value)
Remove the given value from the code point set.
source code
 
__unichr(self, code_point) source code
 
asPattern(self, with_brackets=True)
Return the code point set as Unicode regular expression character group consisting of a sequence of characters or character ranges.
source code
 
asTuples(self)
Return the codepoints as tuples denoting the ranges that are in the set.
source code
 
negate(self)
Return an instance that represents the inverse of this set.
source code
 
asSingleCharacter(self)
If this set represents a single character, return it as its unicode string value.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  MaxShortCodePoint = 65535
  MaxCodePoint = 1114111
The maximum value for a code point in the Unicode code point space.
  __codepoints = None
hash(x)
  __XMLtoPythonREMap = {u'': u'\x00', u'-': u'\-', u'[': u'\[',...
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, *args)
(Constructor)

source code 

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__
(inherited documentation)

add(self, value)

source code 

Add the given value to the code point set.

Parameters:
  • value - An integral value denoting a code point, or a tuple (s,e) denoting the start and end (inclusive) code points in a range.
Returns:
self

extend(self, values)

source code 

Add multiple values to a code point set.

Parameters:
  • values - Either a CodePointSet instance, or an iterable whose members are valid parameters to add.
Returns:
self

subtract(self, value)

source code 

Remove the given value from the code point set.

Parameters:
  • value - An integral value denoting a code point, or a tuple (s,e) denoting the start and end (inclusive) code points in a range, or a CodePointSet.
Returns:
self

asPattern(self, with_brackets=True)

source code 

Return the code point set as Unicode regular expression character group consisting of a sequence of characters or character ranges.

This returns a regular expression fragment using Python's regular expression syntax. Note that different regular expression syntaxes are not compatible, often in subtle ways.

Parameters:
  • with_brackets - If True (default), square brackets are added to enclose the returned character group.

asTuples(self)

source code 

Return the codepoints as tuples denoting the ranges that are in the set.

Each tuple (s, e) indicates that the code points from s (inclusive) to e) (inclusive) are in the set.

asSingleCharacter(self)

source code 

If this set represents a single character, return it as its unicode string value. Otherwise return None.


Class Variable Details [hide private]

MaxCodePoint

The maximum value for a code point in the Unicode code point space. This is normally 0xFFFF, because wide unicode characters are generally not enabled in Python builds. If, however, they are enabled, this will be the full value of 0x10FFFF.

Value:
1114111

__XMLtoPythonREMap

Value:
{u'': u'\x00',
 u'-': u'\-',
 u'[': u'\[',
 u'\': u'\\',
 u']': u'\]',
 u'^': u'\^'}