Package concurrent_tree_crawler :: Module multithreaded_crawler :: Class MultithreadedCrawler
[hide private]
[frames] | no frames]

Class MultithreadedCrawler

source code

Runs several threads to crawl the tree.

It is also responsible for all the ancillary stuff: makes sure that the state of the tree is saved to disk, sets up the logging level etc.

Instance Methods [hide private]
 
__init__(self, navigators, sentinel, activity_schedule=None, log_file_path=None, state_file_path=None, save_period=None, logging_level=40) source code
AbstractNode
run(self)
Returns: sentinel node
source code
 
_create_crawlers_manager(self, tree, navigators) source code
 
__start_tree_saver_thread(self) source code
number of seconds
__sleep_until_activity_period(self)
Sleep (stop program execution) until there's a time to wake up.
source code
Static Methods [hide private]
 
__load_state_file(file_path, sentinel) source code
 
__change_state_from_PROCESSING_to_OPEN(node) source code
Method Details [hide private]

__init__(self, navigators, sentinel, activity_schedule=None, log_file_path=None, state_file_path=None, save_period=None, logging_level=40)
(Constructor)

source code 
Parameters:
  • navigators (list of AbstractTreeNavigators) - list of navigators to be used by the crawler. Each navigator will be run in a separate thread, thus the number of the threads is equal to the number of navigators.
  • sentinel (AbstractNode) - a technical node which will be made parent of the root node.
  • activity_schedule (AbstractActivitySchedule) - if None, no schedule is used and the program works until it finishes crawling.
  • log_file_path - path to the log file. If None, no log file will be used.
  • state_file_path - path to the file where the state of the program will be saved. If None, the state will not be saved.
  • save_period - time between saving the tree state. If state_file_path is None, this value is ignored.
  • logging_level - one of the logging level constants from logging

run(self)

source code 
Returns: AbstractNode
sentinel node

__sleep_until_activity_period(self)

source code 

Sleep (stop program execution) until there's a time to wake up.

Returns: number of seconds
activity time, i.e. time until the start of the next sleep period, None if such time point cannot be determined (as in case when the activity time will not stop in future).