dsegmenter.mateseg¶
Package providing discourse segmenter for Mate dependency graphs.
-
dsegmenter.mateseg.
__all__
¶ List[str] – list of sub-modules exported by this package
str – package’s author
-
dsegmenter.mateseg.
__email__
¶ str – email of package’s author
-
dsegmenter.mateseg.
__name__
¶ str – package’s name
-
dsegmenter.mateseg.
__version__
¶ str – package version
-
class
dsegmenter.mateseg.
DependencyGraph
(tree_str=None, cell_extractor=None, zero_based=False, cell_separator=None, top_relation_label=u'ROOT')[source]¶ -
address_span
(start_address)[source]¶ returns the addresses of nodes (im)mediately depending on the given starting address in a dependency graph, except for the root node
-
annotate
(iterable, field_name)[source]¶ annotate the nodes (excluding the artifical root) with an additional non-standard field, the values being provided in an iterable in linear order corresponding to the node order
-
get_dependencies_simple
(address)[source]¶ returns a sorted list of the addresses of all dependencies of the node at the specified address
-
is_valid_parse_tree
()[source]¶ check structural integrity of the parse; for the moment just check for a unique root
-
length
()[source]¶ returns the length in tokens, i.e. the number of nodes excluding the artifical root
-
-
class
dsegmenter.mateseg.
MateSegmenter
(featgen=<function gen_features_for_segment>, model=u'/home/sidorenko/Projects/DiscourseSegmenter/dsegmenter/mateseg/data/mate.model')[source]¶ Class for perfoming discourse segmentation on constituency trees.
-
DEFAULT_CLASSIFIER
= LinearSVC(C=1.0, class_weight=u'balanced', dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class=u'ovr', penalty='l2', random_state=None, tol=0.0001, verbose=0)¶
-
DEFAULT_MODEL
= u'/home/sidorenko/Projects/DiscourseSegmenter/dsegmenter/mateseg/data/mate.model'¶
-
DEFAULT_PIPELINE
= Pipeline(steps=[(u'vectorizer', DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True, sparse=True)), (u'var_filter', VarianceThreshold(threshold=0.0)), (u'classifier', LinearSVC(C=1.0, class_weight=u'balanced', dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class=u'ovr', penalty='l2', random_state=None, tol=0.0001, verbose=0))])¶
-
extract_features_from_text
(dep_forest, seg_forest=None)[source]¶ Extract features from dependency trees.
Parameters: - dep_forrest (list or None) – list of sentence trees to be parsed
- dep_forrest – list of discourse segments
Returns: list of features and list of labels
Return type: 2-tuple[list, list]
-
segment
(a_trees)[source]¶ Create discourse segments based on the Mate trees.
Parameters: a_trees (list) – list of sentence trees to be parsed Returns: constructed segment trees Return type: list
-
segment_text
(dep_forest)[source]¶ Segment all sentences of a text.
Parameters: dep_forrest (list[dsegmenter.mateseg.dependency_graph]) – list of sentence trees to be parsed Returns: constructed segment trees Return type: list
-
-
dsegmenter.mateseg.
read_trees
(a_lines)[source]¶ Read file and yield DependencyGraphs.
Parameters: a_lines (list[str]) – iterable over decoded lines of the input file Yields: nltk.parse.dependencygraph.DependencyGraph