HTML Parser Module¶
Basic parser module for parsing dragline.http.Response
HtmlParser Function¶
-
dragline.parser.
HtmlParser
(response, absolute_links=True)¶ Parameters: response ( dragline.http.Response
) –This method takes response object as its argument and returns the lxml etree object.
HtmlParser function returns a lxml object of type HtmlElement which got few potential methods. All the details of lxml object are discussed in section lxml.html.HtmlElement.
-
class
dragline.parser.html.
HTMLElement
¶ HtmlElement object is returned by the HtmlParser function:
>>> response = Request('http://www.example.org/').send() >>> parser = HtmlParser(response)
-
cssselect
(expr)¶ Select elements from this element and its children, using a CSS selector expression. (Note that .xpath(expr) is also available as on all lxml elements.)
-
extract_text
()¶ Returns the text content of the element, including the text content of its children, with no markup.
>>> list(parser.extract_urls()) ['http://www.iana.org/domains/example']
-
extract_urls
(xpath=None, domains=None)¶ Returns a list of all the links with given domains
-