Utilities for web mining and HTML processing.
Finds RSS/Atom feeds in HTML content.
| Parameters: |
|
|---|---|
| Returns: | A list of feed URLs if any. |
Finds href links in a HTML string.
| Parameters: | content – A HTML string. |
|---|---|
| Returns: | A list of href links found by BeautifulSoup. |
Transforms file paths to URLs starting with file:
| Parameters: | path – The file path. |
|---|---|
| Returns: | The corresponding URL. |
>>> from dautil import web
>>> web.path2url('/home/dautil')
'file:///home/dautil'
Waits for a HTML element to become available.
| Parameters: |
|
|---|---|
| Returns: | The web element you are waiting for. |