Graphical representation of a corpus of text files.
Two types of graphs can be created : SentenceGraph - Nodes are the sentences in the corpus and an edge is defined between sentences
based on their similarity.
Compute the documents containg a given word.
Build the keyword graph.
Build the sentence graph.
similarityThreshold : If the similarity of two sentences is above similarityThreshold, there is an edge between the nodes represented by the sentences. Default value is 0.1 stemming : If True, words will be stemmed by Porter stemming.
Stemming requires package nltk.
Compute the cosine similarity of two sentences. Similarity is defined as the cosine similarity of the vectors representating the sentences.
s1 : sentence 1 s2 : sentence 2 stemming : If True, words will be stemmed by Porter stemming.
Stemming requires package nltk.
Compute all the unique words in the corpus.
Compute the normalized frequency of occurance of words in the sentence.
sentence : A sentence stemming : If True, words will be stemmed by Porter stemming.
Stemming requires package nltk.