Package tipy :: Module minr :: Class TextMiner
[hide private]
[frames] | no frames]

Class TextMiner

source code

object --+    
         |    
     Miner --+
             |
            TextMiner
Known Subclasses:

The miner for text files.

This miner mines text files by extracting valid n-grams from them and inserting them in databases. Mining a text require:


See Also: TextTokenizer, db.insert_ngrams

Class Hierarchy for TextMiner
Class Hierarchy for TextMiner

Nested Classes [hide private]
    Inherited from Miner
  __metaclass__
Metaclass for defining Abstract Base Classes (ABCs).
Instance Methods [hide private]
 
__init__(self, config, minerName, callback=None)
Constructor of the FacebookMiner class.
source code
 
update_db(self, textPath)
Mine a text file, updating the database.
source code
 
crt_new_db(self, textPath)
Mine a text file.
source code
dict
crt_ngram_map(self, textPath, n)
Create a n-gram dictionary from a file.
source code
 
add_to_db(self, ngramMap, n, append=False)
Add n-grams of an n-gram dictionary to the database.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

    Inherited from Miner
 
mine(self) source code
 
rm_db(self)
Remove the database file (call os.system).
source code
Class Variables [hide private]
    Inherited from Miner
  __abstractmethods__ = frozenset(['mine'])
  _abc_cache = <_weakrefset.WeakSet object at 0x7f2a42131ad0>
  _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2a421...
  _abc_negative_cache_version = 44
  _abc_registry = <_weakrefset.WeakSet object at 0x7f2a42131a90>
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, config, minerName, callback=None)
(Constructor)

source code 

Constructor of the FacebookMiner class.

Parameters:
  • config (drvr.Configuration) - The configuration file. It is used to retrieve the miner parameters.
  • minerName (str) - The name of the miner.
  • callback (fun(float, ...)) - The callback is used to show the progress percentage. In the gui a callback method is implemented to update a progress bar showing the n-grams insertion progress (cf. py).
Overrides: object.__init__

update_db(self, textPath)

source code 

Mine a text file, updating the database.

Parameters:
  • textPath (str) - The path to the text file to mine.

crt_new_db(self, textPath)

source code 

Mine a text file.

This method dosen't try to update the n-grams counts so it will fail if it tries to add an n-gram which is already in the database but this method is a little faster than update_db().

Parameters:
  • textPath (str) - The path to the text file to mine.

Note: If you're intending to create a new database but it already exists please consider calling rm_db() first.

crt_ngram_map(self, textPath, n)

source code 

Create a n-gram dictionary from a file.

Parameters:
  • textPath (str) - The path to the text file to mine.
  • n (int) - The n in n-gram. Specify the maximum size of the n-grams to generate.
Returns: dict
The n-gram dictionary.

add_to_db(self, ngramMap, n, append=False)

source code 

Add n-grams of an n-gram dictionary to the database.

Parameters:
  • ngramMap (dict) - The n-gram dictionnary returned by TextTokenizer.tknize_text(). See the above-mentioned method docstring for more information.
  • n (int) - The n in n-gram. Specify the maximum size of the n-grams to generate.
  • append (bool) - Indicate weither the n-grams should be appened to the database.