dsegmenter.bparseg.bparsegmenter

class dsegmenter.bparseg.bparsegmenter.BparSegmenter(a_featgen=<function featgen>, a_classify=<function classify>, a_model=u'/home/sidorenko/Projects/DiscourseSegmenter/dsegmenter/bparseg/data/bpar.model')[source]

Class for perfoming discourse segmentation on constituency trees.

DEFAULT_CLASSIFIER = LinearSVC(C=0.3, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class=u'crammer_singer', penalty='l2', random_state=None, tol=0.0001, verbose=0)

classifier object – default classification method

DEFAULT_MODEL = u'/home/sidorenko/Projects/DiscourseSegmenter/dsegmenter/bparseg/data/bpar.model'

str – path to default model to use in classification

DEFAULT_PIPELINE = Pipeline(steps=[(u'vectorizer', DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True, sparse=True)), (u'var_filter', VarianceThreshold(threshold=0.0)), (u'LinearSVC', LinearSVC(C=0.3, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class=u'crammer_singer', penalty='l2', random_state=None, tol=0.0001, verbose=0))])

pipeline object – default pipeline object used for classification

__init__(a_featgen=<function featgen>, a_classify=<function classify>, a_model=u'/home/sidorenko/Projects/DiscourseSegmenter/dsegmenter/bparseg/data/bpar.model')[source]

Class constructor.

Parameters:
  • a_featgen (method) – function to be used for feature generation
  • a_classify (method) – pointer to 2-arg function which predicts segment class for BitPar tree based on the model and features generated for that tree
  • a_model (str) – path to a pre-trained model (previously dumped by joblib) or valid classification object or None
__weakref__

list of weak references to the object (if defined)

segment(a_trees)[source]

Create discourse segments based on the BitPar trees.

Parameters:a_trees (list) – list of sentence trees to be parsed
Returns:constructed segment trees
Return type:list
test(a_trees, a_segments)[source]

Estimate performance of segmenter model.

Parameters:
  • a_trees (list) – BitPar trees
  • a_segments (list) – corresponding gold segments for trees
Returns:

macro and micro-averaged F-scores

Return type:

2-tuple

train(a_trees, a_segs, a_path)[source]

Train segmenter model.

Parameters:
  • a_trees (list) – BitPar trees
  • a_segs (list) – discourse segments
  • a_path (str) – path to file in which the trained model should be stored
Returns:

Return type:

void