Package cjklib :: Module build :: Class EDICTFormatBuilder
[hide private]
[frames] | no frames]

Class EDICTFormatBuilder

source code


Provides an abstract class for loading EDICT formatted dictionaries.

One column will be provided for the headword, one for the reading (in EDICT that is the Kana) and one for the translation.


To Do (Fix): Optimize insert, use transaction which disables autocommit and cosider passing data all at once, requiring proper handling of row indices.

Nested Classes [hide private]
  TableGenerator
Generates the dictionary entries.
Instance Methods [hide private]
 
__init__(self, dataPath, dbConnectInst, quiet=False)
Constructs the TableBuilder.
source code
 
getGenerator(self)
Returns the entry generator.
source code
str
getArchiveContentName(self, filePath)
Function extracting the name of contained file from the zipped archive using the file name.
source code
file
getFileHandle(self, filePath)
Returns a handle to the give file.
source code
str
buildFTS3CreateTableStatement(self, table)
Returns a SQL statement for creating a virtual table using FTS3 for SQLite.
source code
 
buildFTS3Tables(self, tableName, columns, columnTypeMap={}, primaryKeys=[], fullTextColumns=[])
Builds a FTS3 table construct for supporting full text search under SQLite.
source code
 
insertFTS3Tables(self, tableName, generator, columns=[], fullTextColumns=[]) source code
bool
testFTS3(self)
Tests if the SQLite FTS3 extension is supported on the build system.
source code
 
build(self)
Build the table provided by the TableBuilder.
source code
 
remove(self)
Removes the table provided by the TableBuilder from the database.
source code

Inherited from EntryGeneratorBuilder: getEntryDict

Inherited from TableBuilder: buildIndexObjects, buildTableObject, findFile

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables [hide private]
  COLUMNS = ['Headword', 'Reading', 'Translation']
Columns that will be built
  PRIMARY_KEYS = []
Primary keys of the created table
  INDEX_KEYS = [['Headword'], ['Reading']]
Index keys (not unique) of the created table
  COLUMN_TYPES = {'Headword': String(length=255, convert_unicode...
Column types for created table
  FULLTEXT_COLUMNS = ['Translation']
Column names which shall be fulltext searchable.
  FILE_NAMES = None
Names of file containing the edict formated dictionary.
  ENCODING = 'utf-8'
Encoding of the dictionary file.
  ENTRY_REGEX = None
Regular Expression matching a dictionary entry.
  IGNORE_LINES = 0
Number of starting lines to ignore.
  FILTER = None
Filter to apply to the read entry before writing to table.

Inherited from TableBuilder: DEPENDS, PROVIDES

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, dataPath, dbConnectInst, quiet=False)
(Constructor)

source code 

Constructs the TableBuilder.

Parameters:
  • dataPath - optional list of paths to the data file(s)
  • dbConnectInst - instance of a DatabaseConnector. If not given all sql code will be printed to stdout.
  • quiet - if true no status information will be printed to stderr
Overrides: object.__init__
(inherited documentation)

getGenerator(self)

source code 

Returns the entry generator. Needs to be implemented by child classes.

Overrides: EntryGeneratorBuilder.getGenerator
(inherited documentation)

getArchiveContentName(self, filePath)

source code 

Function extracting the name of contained file from the zipped archive using the file name. Reimplement and adapt to own needs.

Parameters:
  • filePath (str) - path of file
Returns: str
name of file in archive

getFileHandle(self, filePath)

source code 

Returns a handle to the give file.

The file can be either normal content, zip, tar, .tar.gz, tar.bz2

Parameters:
  • filePath (str) - path of file
Returns: file
handle to file's content

buildFTS3CreateTableStatement(self, table)

source code 

Returns a SQL statement for creating a virtual table using FTS3 for SQLite.

Parameters:
  • table (object) - SQLAlchemy table object representing the FTS3 table
Returns: str
Create table statement

buildFTS3Tables(self, tableName, columns, columnTypeMap={}, primaryKeys=[], fullTextColumns=[])

source code 

Builds a FTS3 table construct for supporting full text search under SQLite.

Parameters:
  • tableName (str) - name of table
  • columns (list of str) - column names
  • columnTypeMap (dict of str and object) - mapping of column name to SQLAlchemy Column
  • primaryKeys (list of str) - list of primary key columns
  • fullTextColumns (list of str) - list of fulltext columns

testFTS3(self)

source code 

Tests if the SQLite FTS3 extension is supported on the build system.

Returns: bool
True if the FTS3 extension exists, False otherwise.

build(self)

source code 

Build the table provided by the TableBuilder.

A search index is created to allow for fulltext searching.

Overrides: TableBuilder.build

remove(self)

source code 

Removes the table provided by the TableBuilder from the database.

Overrides: TableBuilder.remove
(inherited documentation)

Class Variable Details [hide private]

COLUMN_TYPES

Column types for created table

Value:
{'Headword': String(length=255, convert_unicode=False, assert_unicode=\
None),
 'Reading': String(length=255, convert_unicode=False, assert_unicode=N\
one),
 'Translation': Text(length=None, convert_unicode=False, assert_unicod\
e=None)}

ENTRY_REGEX

Regular Expression matching a dictionary entry. Needs to be overwritten if not strictly follows the EDICT format.

Value:
None