dautil.web¶

Utilities for web mining and HTML processing.

dautil.web.find_feeds(url, html)¶

Finds RSS/Atom feeds in HTML content.

Parameters:	url – A url used as the base of the feed. html – A string containing HTML to parse.
Returns:	A list of feed URLs if any.

dautil.web.find_hrefs(content)¶

Finds href links in a HTML string.

Parameters:	content – A HTML string.
Returns:	A list of href links found by BeautifulSoup.

dautil.web.path2url(path)¶

Transforms file paths to URLs starting with file:

Parameters:	path – The file path.
Returns:	The corresponding URL.

>>> from dautil import web
>>> web.path2url('/home/dautil')
'file:///home/dautil'

dautil.web.wait_browser(browser, selector, secs=10, by='xpath')¶

Waits for a HTML element to become available.

Parameters:	browser – An instance of a Selenium browser. selector – An expression used to select the web element. by – The selection method such as XPath or tag name.
Returns:	The web element you are waiting for.