Package concurrent_tree_crawler :: Package html_multipage_navigator :: Module sample_page_analyzer :: Class IssuePageAnalyzer

[frames] | no frames]

Class IssuePageAnalyzer

A class that parses issues-level pages

Instance Methods

__init__(self, dst_dir_path)

source code

process(self, tree_path, page_file)
Process the node (normally, this method is called once for every node).

source code

PageLinks

get_links(self, page_file, child_links_retrieved_so_far)
Returns: information about links on the given page.

source code

Method Details

process(self, tree_path, page_file)

Process the node (normally, this method is called once for every node).

Parameters:

tree_path - path to the tree node the navigator is currently in i.e. subsequent node names from the tree root to the current node. This might be e.g. ["root"] for a path to the root node or ["root", "magazine-2011-09-18", "article_23"] for some other node inside the tree hierarchy.
page_file - file-like structure to be processed

Overrides: abstract_page_analyzer.AbstractPageAnalyzer.process

(inherited documentation)

get_links(self, page_file, child_links_retrieved_so_far)

Parameters:

page_file - file-like structure to be analyzed
child_links_retrieved_so_far_count - number of child links retrieved so far in current node (from previous pages)

Returns: PageLinks

information about links on the given page. The given default implementation is made for a leaf node (a page with no children).

Overrides: abstract_page_analyzer.AbstractPageAnalyzer.get_links

(inherited documentation)