| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
Tokenizer --+
|
TextTokenizer
Tokenizer to tokenize a text file.
This tokenizer recieve a text file and generate n-grams of a given size "n". It is usefule to the text miner in order to generate n-grams to be inserted in a database.
|
| Class Hierarchy for TextTokenizer |
|
|||
| Inherited from Tokenizer | |||
|---|---|---|---|
|
__metaclass__ Metaclass for defining Abstract Base Classes (ABCs). |
|||
|
|||
|
|||
|
|||
|
Inherited from |
|||
| Inherited from Tokenizer | |||
|---|---|---|---|
|
|||
|
|||
|
|||
| bool |
|
||
| bool |
|
||
|
|||
|
|||
|
|||
|
|||
| Inherited from Tokenizer | |||
|---|---|---|---|
__abstractmethods__ =
|
|||
_abc_cache = <_weakrefset.WeakSet object at 0x7f2a42321710>
|
|||
_abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2a423
|
|||
_abc_negative_cache_version = 39
|
|||
_abc_registry = <_weakrefset.WeakSet object at 0x7f2a42321690>
|
|||
|
|||
|
Inherited from |
|||
|
|||
TextTokenizer creator.
|
Tokenize a file and return a dictionary mapping its n-grams. The dictionary looks like:
{ ('in', 'the', 'second'): 4,
('right', 'hand', 'of'): 1,
('subject', 'to', 'the'): 2,
('serious', 'rebuff', 'in'): 1,
('spirit', 'is', 'the'): 1 }
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Tue Jul 14 21:07:50 2015 | http://epydoc.sourceforge.net |