Bozo Detection¶
Universal Feed Parser can parse feeds whether they are well-formed
XML or not. However, since some
applications may wish to reject or warn users about non-well-formed feeds,
Universal Feed Parser sets the bozo
bit when it detects that a
feed is not well-formed. Thanks to Tim Bray for
suggesting this terminology.
Detecting a non-well-formed feed¶
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d.bozo
0
>>> d = feedparser.parse('http://feedparser.org/tests/illformed/rss/aaa_illformed.xml')
>>> d.bozo
1
>>> d.bozo_exception
<xml.sax._exceptions.SAXParseException instance at 0x00BAAA08>
>>> exc = d.bozo_exception
>>> exc.getMessage()
"expected '>'\\n"
>>> exc.getLineNumber()
6
There are many reasons an XML document could be non-well-formed besides this example (incomplete end tags) See Character Encoding Detection for some other ways to trip the bozo bit.