Package concurrent_tree_crawler :: Package html_multipage_navigator :: Module tree_navigator :: Class HTMLMultipageNavigator
[hide private]
[frames] | no frames]

Class HTMLMultipageNavigator

source code


A web site tree navigator.

It is assumed that all web pages corresponding to the nodes of the tree on the given level have the same basic characteristics and are analyzed in the same way, namely by the same object inheriting from AbstractPageAnalyzer. In particular, all of the leaf web pages are placed on the same level of the tree. Some of the parts of the tree might be missing, which results in marking certain nodes of the tree as ERROR.

Instance Methods [hide private]
 
__init__(self, address, levels, browser_creator=None) source code
 
start_in_root(self)
Start in the root node of the domain tree
source code
 
get_path(self)
Returns: path to the tree node the navigator is currently in i.e.
source code
list of strings
get_children(self)
Returns: names of children of the current node of the domain tree
source code
 
__get_current_children(self) source code
 
__get_current_level(self) source code
 
__is_on_leafs_level(self) source code
 
move_to_child(self, child_name)
Move to the child of the current node of the domain tree.
source code
 
move_to_parent(self)
Move to the parent of the current node of the domain tree.
source code
 
process_node_and_check_if_is_leaf(self)
Returns: True if the current node is a leaf, False if it is an internal node of the domain tree.
source code
Static Methods [hide private]
 
__generate_new_name(original_name, children_dict) source code
Class Variables [hide private]
  __repetition_suffix_template = '-repetition_{}'
  __generate_new_name_max_repetitions = 100000
Instance Variables [hide private]
  __current_children
Info about children on current level of tree structure.
Method Details [hide private]

__init__(self, address, levels, browser_creator=None)
(Constructor)

source code 
Parameters:
  • browser_creator (AbstractWebBrowserCreator) - a creator of browsers that will be used while crawling the web site. The default browser used here is MechanizeBrowser.
  • levels - list of Level objects. The first element is a level corresponding to the root node, the last one corresponds to leafs level.
  • address - URL address string

start_in_root(self)

source code 

Start in the root node of the domain tree

Raises:
  • NavigationException - see class description for details of ramification of raising such an exception.
Overrides: abstract_tree_navigator.AbstractTreeNavigator.start_in_root
(inherited documentation)

get_path(self)

source code 
Returns:
path to the tree node the navigator is currently in i.e. subsequent node names from the tree root to the current node

get_children(self)

source code 
Returns: list of strings
names of children of the current node of the domain tree
Raises:
  • NavigationException - see class description for details of ramification of raising such an exception.
Overrides: abstract_tree_navigator.AbstractTreeNavigator.get_children
(inherited documentation)

move_to_child(self, child_name)

source code 

Move to the child of the current node of the domain tree.

Parameters:
  • child_name - name of the child to move to
Raises:
  • NavigationException - see class description for details of ramification of raising such an exception.
Overrides: abstract_tree_navigator.AbstractTreeNavigator.move_to_child
(inherited documentation)

move_to_parent(self)

source code 

Move to the parent of the current node of the domain tree.

Raises:
  • NavigationException - see class description for details of ramification of raising such an exception.
Overrides: abstract_tree_navigator.AbstractTreeNavigator.move_to_parent
(inherited documentation)

process_node_and_check_if_is_leaf(self)

source code 
Returns:
True if the current node is a leaf, False if it is an internal node of the domain tree.
Raises:
  • NavigationException - see class description for details of ramification of raising such an exception.
Overrides: abstract_tree_navigator.AbstractTreeNavigator.process_node_and_check_if_is_leaf
(inherited documentation)

Instance Variable Details [hide private]

__current_children

Info about children on current level of tree structure. OrderedDictionary with the key as child name and the value as a link to child web page.