| Home | Trees | Indices | Help |
|---|
|
|
object --+
|
Tokenizer --+
|
TextTokenizer
Tokenizer to tokenize a text file.
This tokenizer recieve a text file and generate n-grams of a given size "n". It is usefule to the text miner in order to generate n-grams to be inserted in a database.
|
| Class Hierarchy for TextTokenizer |
|
|||
|
|||
|
|||
|
Inherited from |
|||
| Inherited from Tokenizer | |||
|---|---|---|---|
|
|||
|
|||
|
|||
| bool |
|
||
| bool |
|
||
|
|||
|
|||
|
|||
|
|||
| Inherited from Tokenizer | |||
|---|---|---|---|
__metaclass__ = abc.ABCMeta
|
|||
|
|||
|
Inherited from |
|||
|
|||
TextTokenizer creator.
|
Tokenize a file and return a dictionary mapping its n-grams. The dictionary looks like:
{ ('in', 'the', 'second'): 4,
('right', 'hand', 'of'): 1,
('subject', 'to', 'the'): 2,
('serious', 'rebuff', 'in'): 1,
('spirit', 'is', 'the'): 1 }
|
| Home | Trees | Indices | Help |
|---|
| Generated by Epydoc 3.0.1 on Tue Jun 16 23:30:31 2015 | http://epydoc.sourceforge.net |