Home | Trees | Indices | Help |
|
---|
|
object --+ | Tokenizer --+ | TextTokenizer
Tokenizer to tokenize a text file.
This tokenizer recieve a text file and generate n-grams of a given size "n". It is usefule to the text miner in order to generate n-grams to be inserted in a database.
![]() |
Class Hierarchy for TextTokenizer |
|
|||
Inherited from Tokenizer | |||
---|---|---|---|
__metaclass__ Metaclass for defining Abstract Base Classes (ABCs). |
|
|||
|
|||
|
|||
Inherited from |
|||
Inherited from Tokenizer | |||
---|---|---|---|
|
|||
|
|||
|
|||
bool |
|
||
bool |
|
||
|
|||
|
|||
|
|
|||
Inherited from Tokenizer | |||
---|---|---|---|
__abstractmethods__ =
|
|||
_abc_cache = <_weakrefset.WeakSet object at 0x7f2a42321710>
|
|||
_abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2a423
|
|||
_abc_negative_cache_version = 39
|
|||
_abc_registry = <_weakrefset.WeakSet object at 0x7f2a42321690>
|
|
|||
Inherited from |
|
TextTokenizer creator.
|
Tokenize a file and return a dictionary mapping its n-grams. The dictionary looks like: { ('in', 'the', 'second'): 4, ('right', 'hand', 'of'): 1, ('subject', 'to', 'the'): 2, ('serious', 'rebuff', 'in'): 1, ('spirit', 'is', 'the'): 1 } |
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Tue Jul 14 21:07:50 2015 | http://epydoc.sourceforge.net |