Package tipy :: Module tknz :: Class Tokenizer
[hide private]
[frames] | no frames]

Class Tokenizer

source code

object --+
         |
        Tokenizer
Known Subclasses:

Abstract class for all tokenizers.

Class Hierarchy for Tokenizer
Class Hierarchy for Tokenizer

Nested Classes [hide private]
  __metaclass__
Metaclass for defining Abstract Base Classes (ABCs).
Instance Methods [hide private]
 
__init__(self, stream, blankspaces=' \x0c\n\\c\r\t\x0b\xc2\x85', separators='`~!@#$%^&*()_-+=\\|]}[{";:/?.>,<\xe2\x80\xa0\xe2\x80\x9e\xe2\...)
Constructor of the Tokenizer abstract class.
source code
bool
is_blankspace(self, char)
Test if a character is a blankspace.
source code
bool
is_separator(self, char)
Test if a character is a separator.
source code
 
count_chars(self) source code
 
reset_stream(self) source code
 
count_tokens(self) source code
 
has_more_tokens(self) source code
 
next_token(self) source code
 
progress(self) source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  __abstractmethods__ = frozenset(['count_chars', 'count_tokens'...
  _abc_cache = <_weakrefset.WeakSet object at 0x7f2a42321710>
  _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2a423...
  _abc_negative_cache_version = 39
  _abc_registry = <_weakrefset.WeakSet object at 0x7f2a42321690>
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, stream, blankspaces=' \x0c\n\\c\r\t\x0b\xc2\x85', separators='`~!@#$%^&*()_-+=\\|]}[{";:/?.>,<\xe2\x80\xa0\xe2\x80\x9e\xe2\...)
(Constructor)

source code 

Constructor of the Tokenizer abstract class.

Parameters:
  • stream (str or io.IOBase) - The stream to tokenize. Can be a filename or any open IO stream.
  • blankspaces (str) - The characters that represent empty spaces.
  • separators (str) - The characters that separate token units (e.g. word boundaries).
Overrides: object.__init__

is_blankspace(self, char)

source code 

Test if a character is a blankspace.

Parameters:
  • char (str) - The character to test.
Returns: bool
True if character is a blankspace, False otherwise.

is_separator(self, char)

source code 

Test if a character is a separator.

Parameters:
  • char (str) - The character to test.
Returns: bool
True if character is a separator, False otherwise.

count_chars(self)

source code 
Decorators:
  • @abstractmethod

reset_stream(self)

source code 
Decorators:
  • @abstractmethod

count_tokens(self)

source code 
Decorators:
  • @abstractmethod

has_more_tokens(self)

source code 
Decorators:
  • @abstractmethod

next_token(self)

source code 
Decorators:
  • @abstractmethod

progress(self)

source code 
Decorators:
  • @abstractmethod

Class Variable Details [hide private]

__abstractmethods__

Value:
frozenset(['count_chars',
           'count_tokens',
           'has_more_tokens',
           'next_token',
           'progress',
           'reset_stream'])

_abc_negative_cache

Value:
<_weakrefset.WeakSet object at 0x7f2a42321790>