dsegmenter.edseg

Package providing rule-based discourse segmenter for CONLL trees.

dsegmenter.edseg.chunking

module – routines for internal clause segmentation

dsegmenter.edseg.clause_segmentation

module – utilities and classes for rule-based clause segmenter

dsegmenter.edseg.conll

module – interface for dealing with CONLL data

dsegmenter.edseg.data

module – data definitions and data reading routines

dsegmenter.edseg.edssegmenter

module – definition of rule-based discourse segmenter

dsegmenter.edseg.finitestateparsing

module – parsing routines based on finite-state mechanisms

dsegmenter.edseg.util

module – auxiliary match routines needed for rule matching

class dsegmenter.edseg.CONLL(istring='')

Class for storing and manipulating CONLL parse forrest information.

An instance of this class comprises information about one or multiple parsed sentences in CONLL format.

__getitem__(i)

Return reference to i-th sentence in forrest.

Parameters:i (int) – integer index of sentence in forrest
Returns:i-th CONLL sentence in forrest.
Return type:CONLLSentence
Raises:IndexError – is raised if i is outside of forrest boundaries.
__iter__()

Return iterator object over sentences.

__setitem__(i, value)

Set i-th sentence in forrest to specified value.

Parameters:
  • i (int) – integer index of sentence in forrest
  • value (CONLLSentence) – to which i-th sentence should be set
Returns:

new value of i-th sentence

Return type:

CONLLSentence

Raises:

IndexError – raised if i is outside of forrest boundaries.

__str__()

Return string representation of this object encoded in UTF-8.

__unicode__()

Return unicode representation of current object.

add_line(iline=u'')

Parse line and add it as CONLL word.

Parameters:iline (basestring) – input line(s) to parse
Returns:
Return type:void
clear()

Remove all stored information.

get_words()

Return list of all words wird indices from all sentences.

Return a list of all words from all sentences in consecutive order as tuples with three elements (word, sentence_idx, word_idx) where the first element is a word, the next element is its index in the list of sentences, and the third element is word’s index within the sentence.

is_empty()

Check whether any sentences are stored.

Returns:True if there is at least one sentence.
Return type:bool
class dsegmenter.edseg.EDSSegmenter(a_clause_segmenter=None)[source]

Class for perfoming discourse segmentation on CONLL dependency trees.

_clause_segmenter

internal worker for doing discourse segmentation

_clause_discarder

internal automaton which decides if sentence shouldn’t be processed

_sent

internal reference to the sentence being processed

_tokens

internal reference to the list of processed tokens

segment[source]

perform discourse segmentation of the CONLL sentence

segment(sent)[source]

Segment CONLL trees.

Parameters:sent (CONLLSentence) –
Returns:sentence-level discourse segment
Return type:Segment