dsegmenter.edseg¶
Package providing rule-based discourse segmenter for CONLL trees.
-
dsegmenter.edseg.
chunking
¶ module – routines for internal clause segmentation
-
dsegmenter.edseg.
clause_segmentation
¶ module – utilities and classes for rule-based clause segmenter
-
dsegmenter.edseg.
conll
¶ module – interface for dealing with CONLL data
-
dsegmenter.edseg.
data
¶ module – data definitions and data reading routines
-
dsegmenter.edseg.
edssegmenter
¶ module – definition of rule-based discourse segmenter
-
dsegmenter.edseg.
finitestateparsing
¶ module – parsing routines based on finite-state mechanisms
-
dsegmenter.edseg.
util
¶ module – auxiliary match routines needed for rule matching
-
class
dsegmenter.edseg.
CONLL
(istring='')¶ Class for storing and manipulating CONLL parse forrest information.
An instance of this class comprises information about one or multiple parsed sentences in CONLL format.
-
__getitem__
(i)¶ Return reference to i-th sentence in forrest.
Parameters: i (int) – integer index of sentence in forrest Returns: i-th CONLL sentence in forrest. Return type: CONLLSentence Raises: IndexError
– is raised if i is outside of forrest boundaries.
-
__iter__
()¶ Return iterator object over sentences.
-
__setitem__
(i, value)¶ Set i-th sentence in forrest to specified value.
Parameters: - i (int) – integer index of sentence in forrest
- value (CONLLSentence) – to which i-th sentence should be set
Returns: new value of i-th sentence
Return type: CONLLSentence
Raises: IndexError
– raised if i is outside of forrest boundaries.
-
__str__
()¶ Return string representation of this object encoded in UTF-8.
-
__unicode__
()¶ Return unicode representation of current object.
-
add_line
(iline=u'')¶ Parse line and add it as CONLL word.
Parameters: iline (basestring) – input line(s) to parse Returns: Return type: void
-
clear
()¶ Remove all stored information.
-
get_words
()¶ Return list of all words wird indices from all sentences.
Return a list of all words from all sentences in consecutive order as tuples with three elements (word, sentence_idx, word_idx) where the first element is a word, the next element is its index in the list of sentences, and the third element is word’s index within the sentence.
-
is_empty
()¶ Check whether any sentences are stored.
Returns: True if there is at least one sentence. Return type: bool
-
-
class
dsegmenter.edseg.
EDSSegmenter
(a_clause_segmenter=None)[source]¶ Class for perfoming discourse segmentation on CONLL dependency trees.
-
_clause_segmenter
¶ internal worker for doing discourse segmentation
-
_clause_discarder
¶ internal automaton which decides if sentence shouldn’t be processed
-
_sent
¶ internal reference to the sentence being processed
-
_tokens
¶ internal reference to the list of processed tokens
-
segment
(sent)[source] Segment CONLL trees.
Parameters: sent (CONLLSentence) – Returns: sentence-level discourse segment Return type: Segment
-