Package concurrent_tree_crawler :: Package html_multipage_navigator :: Module sample_page_analyzer :: Class IssuePageAnalyzer
[hide private]
[frames] | no frames]

Class IssuePageAnalyzer

source code


A class that parses issues-level pages

Instance Methods [hide private]
 
__init__(self, dst_dir_path) source code
 
process(self, tree_path, page_file)
Process the node (normally, this method is called once for every node).
source code
PageLinks
get_links(self, page_file, child_links_retrieved_so_far)
Returns: information about links on the given page.
source code
Method Details [hide private]

process(self, tree_path, page_file)

source code 

Process the node (normally, this method is called once for every node).

Parameters:
  • tree_path - path to the tree node the navigator is currently in i.e. subsequent node names from the tree root to the current node. This might be e.g. ["root"] for a path to the root node or ["root", "magazine-2011-09-18", "article_23"] for some other node inside the tree hierarchy.
  • page_file - file-like structure to be processed
Overrides: abstract_page_analyzer.AbstractPageAnalyzer.process
(inherited documentation)

get_links(self, page_file, child_links_retrieved_so_far)

source code 
Parameters:
  • page_file - file-like structure to be analyzed
  • child_links_retrieved_so_far_count - number of child links retrieved so far in current node (from previous pages)
Returns: PageLinks
information about links on the given page. The given default implementation is made for a leaf node (a page with no children).
Overrides: abstract_page_analyzer.AbstractPageAnalyzer.get_links
(inherited documentation)