Utilities for web mining and HTML processing.
Finds RSS/Atom feeds in HTML content.
Parameters: |
|
---|---|
Returns: | A list of feed URLs if any. |
Finds href links in a HTML string.
Parameters: | content – A HTML string. |
---|---|
Returns: | A list of href links found by BeautifulSoup. |
Transforms file paths to URLs starting with file:
Parameters: | path – The file path. |
---|---|
Returns: | The corresponding URL. |
>>> from dautil import web
>>> web.path2url('/home/dautil')
'file:///home/dautil'
Waits for a HTML element to become available.
Parameters: |
|
---|---|
Returns: | The web element you are waiting for. |