Universal Feed Parser can parse feeds whether they are well-formed XML or not. However, since some applications may wish to reject or warn users about non-well-formed feeds, Universal Feed Parser sets the bozo bit when it detects that a feed is not well-formed. Thanks to Tim Bray for suggesting this terminology.
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.bozo 0 >>> d = feedparser.parse('http://feedparser.org/tests/illformed/rss/aaa_illformed.xml') >>> d.bozo 1 >>> d.bozo_exception <xml.sax._exceptions.SAXParseException instance at 0x00BAAA08> >>> exc = d.bozo_exception >>> exc.getMessage() "expected '>'\\n" >>> exc.getLineNumber() 6
There are many reasons an XML document could be non-well-formed besides this example (incomplete end tags) See Character Encoding Detection for some other ways to trip the bozo bit.