concurrent_tree_crawler.html_multipage_navigator.sample_page

Class ArticlePageAnalyzer

A class that downloads article pages

Instance Methods

__init__(self, dst_dir_path)

source code

process(self, tree_path, page_file)
Process the node (normally, this method is called once for every node).

source code

__download_page(self, page_file, dst_file)

source code

Process the node (normally, this method is called once for every node).

Parameters:

tree_path - path to the tree node the navigator is currently in i.e. subsequent node names from the tree root to the current node. This might be e.g. ["root"] for a path to the root node or ["root", "magazine-2011-09-18", "article_23"] for some other node inside the tree hierarchy.
page_file - file-like structure to be processed

(inherited documentation)