Tips¶

Making links absolute¶

You can make links absolute which can be usefull for screen scrapping:

>>> d = pq(url='http://duckduckgo.com/', parser='html')
>>> d('a[tabindex="-1"]').attr('href')
'/about.html'
>>> d.make_links_absolute()
[<html>]
>>> d('a[tabindex="-1"]').attr('href')
'http://duckduckgo.com/about.html'

Using different parsers¶

By default pyquery uses the lxml xml parser and then if it doesn’t work goes on to try the html parser from lxml.html. The xml parser can sometimes be problematic when parsing xhtml pages because the parser will not raise an error but give an unusable tree (on w3c.org for example).

You can also choose which parser to use explicitly:

>>> pq('<html><body><p>toto</p></body></html>', parser='xml')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments')
[<p>]

The html and html_fragments parser are the ones from lxml.html.

Table Of Contents

Tips
- Making links absolute
- Using different parsers

Previous topic

pyquery.ajax – PyQuery AJAX extension

Next topic

This Page