This module used to live inside of lxml as lxml.cssselect before it was extracted as a stand-alone project.
Use HTMLTranslator for HTML documents, GenericTranslator for “generic” XML documents. (The former has a more useful translation for some selectors, based on HTML-specific element types or attributes.)
>>> from cssselect import GenericTranslator, SelectorError >>> try: ... expression = GenericTranslator().css_to_xpath('div.content') ... except SelectorError: ... print('Invalid selector.') ... >>> print(expression) descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]
The resulting expression can be used with lxml’s XPath engine:
>>> from lxml.etree import fromstring >>> document = fromstring(''' ... <div id="outer"> ... <div id="inner" class="content body">text</div> ... </div> ... ''') >>> [e.get('id') for e in document.xpath(expression)] ['inner']
In CSS3 Selectors terms, the top-level object is a group of selectors, a sequence of comma-separated selectors. For example, div, h1.title + p is a group of two selectors.
Parse a CSS group of selectors.
If you don’t care about pseudo-elements or selector specificity, you can skip this and use css_to_xpath().
|Parameters:||css – A group of selectors as an Unicode string.|
|Raises :||SelectorSyntaxError on invalid selectors.|
|Returns:||A list of parsed Selector objects, one for each selector in the comma-separated group.|
Represents a parsed selector.
A FunctionalPseudoElement, or the identifier for the pseudo-element as a string,
|From the Lists3 draft, not in Selectors3||li::marker||'marker'|
The name (identifier) of the pseudo-element, as a string.
The arguments of the pseudo-element, as a list of tokens.
Note: tokens are not part of the public API, and may change between cssselect versions. Use at your own risks.
Translator for “generic” XML documents.
Everything is case-sensitive, no assumption is made on the meaning of element names and attribute names.
Translate a group of selectors to XPath.
Pseudo-elements are not supported here since XPath only knows about “real” elements.
The equivalent XPath 1.0 expression as an Unicode string.
Translate a parsed selector to XPath.
ExpressionError on unknown/unsupported selectors.
The equivalent XPath 1.0 expression as an Unicode string.
Translator for (X)HTML documents.
Has a more useful implementation of some pseudo-classes based on HTML-specific element names and attribute names, as described in the HTML5 specification. It assumes no-quirks mode. The API is the same as GenericTranslator.
|Parameters:||xhtml – If false (the default), element names and attribute names are case-insensitive.|
You can just use except SelectorError: when calling css_to_xpath() and handle both exceptions types.
Parsing a selector that does not match the grammar.
Unknown or unsupported selector (eg. pseudo-class).
This library implements CSS3 selectors as described in the W3C specification. In this context however, there is no interactivity or history of visited links. Therefore, these pseudo-classes are accepted but never match anything:
These applicable pseudo-classes are not yet implemented:
On the other hand, cssselect supports some selectors that are not in the Level 3 specification:
Just like HTMLTranslator is a subclass of GenericTranslator, you can make new sub-classes of either of them and override some methods. This enables you, for example, to customize how some pseudo-class is implemented without forking or monkey-patching cssselect.
The “customization API” is the set of methods in translation classes and their signature. You can look at the source code to see how it works. However, be aware that this API is not very stable yet. It might change and break your sub-class.
In CSS you can use namespace-prefix|element, similar to namespace-prefix:element in an XPath expression. In fact, it maps one-to-one. How prefixes are mapped to namespace URIs depends on the XPath implementation.
Released on 2013-10-17.
Released on 2013-10-11.
Add parser support for functional pseudo-elements.
Update: This version accidentally introduced a backward incompatible change: selector_to_xpath() defaults to rejecting pseudo-elements instead of ignoring them.
Released on 2013-03-15.
Released on 2012-06-14. Code name remember-to-test-with-tox.
0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x. Now all is well again.
Also, pseudo-elements are now correctly made lower-case. (They are supposed to be case-insensitive.)
Released on 2012-06-14.
Bug fix release: see #2, #7 and #10 on GitHub.
Released on 2012-04-25.
Make sure that internal token objects do not “leak” into the public API and Selector.pseudo_element is an unicode string.
Released on 2012-04-24.
Released on 2012-04-20.
Released on 2012-04-18.
Released on 2012-04-17.
Discussion is open if anyone is interested in implementing eg. :target or :visited differently, but they can always do it in a Translator subclass.
Released on 2012-04-16.
These changes allow cssselect to be used without lxml. (Hey, this was the whole point of this project.) The tests still require lxml, though. The removed parts are expected to stay in lxml for backward-compatibility.
:contains() only existed in an early draft of the Selectors specification, and was removed before Level 3 stabilized. Internally, it used a custom XPath extension function which can be difficult to express outside of lxml.
Subclasses of Translator can be made to change the way that some selector (eg. a pseudo-class) is implemented.
Released on 2012-04-13.
Extract lxml.cssselect from the rest of lxml and make it a stand-alone project.
Commit ea53ceaf7e44ba4fbb5c818ae31370932f47774e was taken on 2012-04-11 from the ‘master’ branch of lxml’s git repository. This is somewhere between versions 2.3.4 and 2.4.
The commit history has been rewritten to:
This project has its own import name, tests and documentation. But the code itself is unchanged and still depends on lxml.