dautil.web

Utilities for web mining and HTML processing.

dautil.web.find_feeds(url, html)

Finds RSS/Atom feeds in HTML content.

Parameters:
  • url – A url used as the base of the feed.
  • html – A string containing HTML to parse.
Returns:

A list of feed URLs if any.

dautil.web.find_hrefs(content)

Finds href links in a HTML string.

Parameters:content – A HTML string.
Returns:A list of href links found by BeautifulSoup.
dautil.web.path2url(path)

Transforms file paths to URLs starting with file:

Parameters:path – The file path.
Returns:The corresponding URL.
>>> from dautil import web
>>> web.path2url('/home/dautil')
'file:///home/dautil'
dautil.web.wait_browser(browser, selector, secs=10, by='xpath')

Waits for a HTML element to become available.

Parameters:
  • browser – An instance of a Selenium browser.
  • selector – An expression used to select the web element.
  • by – The selection method such as XPath or tag name.
Returns:

The web element you are waiting for.

Previous topic

dautil.ts

This Page