HTML Parser Module

Basic parser module for parsing dragline.http.Response

HtmlParser Function

dragline.parser.HtmlParser(response, absolute_links=True)
Parameters:response (dragline.http.Response) –

This method takes response object as its argument and returns the lxml etree object.

HtmlParser function returns a lxml object of type HtmlElement which got few potential methods. All the details of lxml object are discussed in section lxml.html.HtmlElement.

class dragline.parser.html.HTMLElement

HtmlElement object is returned by the HtmlParser function:

>>> response = Request('http://www.example.org/').send()
>>> parser = HtmlParser(response)
cssselect(expr)

Select elements from this element and its children, using a CSS selector expression. (Note that .xpath(expr) is also available as on all lxml elements.)

extract_text()

Returns the text content of the element, including the text content of its children, with no markup.

>>> list(parser.extract_urls())
['http://www.iana.org/domains/example']
extract_urls(xpath=None, domains=None)

Returns a list of all the links with given domains