Changes in version 3.0
Universal Feed Parser 3.0 was released on June 21, 2004.
- don’t try iso-8859-1 (can’t distinguish between iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are windows-1252)
- fixed regression that could cause the same encoding to be tried twice (even if it failed the first time)
Universal Feed Parser 3.0fc3 was released on June 18, 2004.
- fixed bug in _changeEncodingDeclaration that failed to parse UTF-16 encoded feeds
- made source into a FeedParserDict
- duplicate admin:generatorAgent/@rdf:resource in generator_detail.url
- added support for image
- refactored parse() fallback logic to try other encodings if SAX parsing fails (previously it would only try other encodings if re-encoding failed)
- remove unichr madness in normalize_attrs now that we’re properly tracking encoding in and out of BaseHTMLProcessor
- set feed.language from root-level xml:lang
- set entry.id from rdf:about
- send Accept header
Universal Feed Parser 3.0fc2 was released on May 10, 2004.
- added and passed Sam’s amp tests
- added and passed my blink tag tests
Universal Feed Parser 3.0fc1 was released on April 23, 2004.
- made results.entries.links and results.entries.enclosures into FeedParserDict
- fixed typo that could cause the same encoding to be tried twice (even if it failed the first time)
- fixed DOCTYPE stripping when DOCTYPE contained entity declarations
- better textinput and image tracking in illformed RSS 1.0 feeds
Universal Feed Parser 3.0b23 was released on April 21, 2004.
- fixed UnicodeDecodeError for feeds that contain high-bit characters in attributes in embedded HTML in description (thanks Thijs van de Vossen)
- moved guid, date, and date_parsed to mapped keys in FeedParserDict
- tweaked FeedParserDict.has_key to return True if asking about a mapped key
Universal Feed Parser 3.0b22 was released on April 19, 2004.
- changed channel to feed, item to entries in results dict
- changed results dict to allow getting values with results.key as well as results[key]
- work around embedded illformed HTML with half a DOCTYPE
- work around malformed Content-Type header
- if character encoding is wrong, try several common ones before falling back to regexes (if this works, bozo_exception is set to CharacterEncodingOverride
- fixed character encoding issues in BaseHTMLProcessor by tracking encoding and converting from Unicode to raw strings before feeding data to sgmllib.SGMLParser
- convert each value in results to Unicode (if possible), even if using regex-based parsing
Universal Feed Parser 3.0b21 was released on April 14, 2004.
Universal Feed Parser 3.0b20 was released on April 7, 2004.
Universal Feed Parser 3.0b19 was released on March 15, 2004.
- fixed bug exploding author information when author name was in parentheses
- removed ultra-problematic mxTidy support
- patch to workaround crash in PyXML/expat when encountering invalid entities (MarkMoraes)
- support for textinput/textInput
Universal Feed Parser 3.0b18 was released on February 17, 2004.
- always map description to summary_detail (Andrei)
- use libxml2 (if available)
Universal Feed Parser 3.0b17 was released on February 13, 2004.
- determine character encoding as per RFC 3023
Universal Feed Parser 3.0b16 was released on February 12, 2004.
- fixed support for RSS 0.90 (broken in b15)
Universal Feed Parser 3.0b15 was released on February 11, 2004.
- fixed bug resolving relative links in wfw:commentRSS
- fixed bug capturing author and contributor URI
- fixed bug resolving relative links in author and contributor URI
- fixed bug resolving relative links in generator URI
- added support for recognizing RSS 1.0
- passed Simon Fell’s namespace tests, and included them permanently in the test suite with his permission
- fixed namespace handling under Python 2.1
Universal Feed Parser 3.0b14 was released on February 8, 2004.
- fixed CDATA handling in non-wellformed feeds under Python 2.1
Universal Feed Parser 3.0b13 was released on February 8, 2004.
- better handling of empty HTML tags (br, hr, img, etc.) in embedded markup, in either HTML or XHTML form (<br>, <br/>, <br />)
Universal Feed Parser 3.0b12 was released on February 6, 2004.
- fiddled with decodeEntities (still not right)
- added support to Atom 0.2 subtitle
- added support for Atom content model in copyright
- better sanitizing of dangerous HTML elements with end tags (script, frameset)
Universal Feed Parser 3.0b11 was released on February 2, 2004.
- added rights to list of elements that can contain dangerous markup
- fiddled with decodeEntities (not right)
- liberalized date parsing even further
Universal Feed Parser 3.0b10 was released on January 31, 2004.
- incorporated ISO-8601 date parsing routines from xml.util.iso8601
Universal Feed Parser 3.0b9 was released on January 29, 2004.
- fixed check for presence of dict function
- added support for summary
Universal Feed Parser 3.0b8 was released on January 28, 2004.
- added support for contributor
Universal Feed Parser 3.0b7 was released on January 28, 2004.
- support Atom-style author element in author_detail (dictionary of name, url, email)
- map author to author_detail if author contains name + email address
Universal Feed Parser 3.0b6 was released on January 27, 2004.
- added feed type and version detection, result['version'] will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized
- added support for creativeCommons:license and cc:license
- added support for full Atom content model in title, tagline, info, copyright, summary
- fixed bug with gzip encoding (not always telling server we support it when we do)
Universal Feed Parser 3.0b5 was released on January 26, 2004.
- fixed bug parsing multiple links at feed level
Universal Feed Parser 3.0b4 was released on January 26, 2004.
- fixed xml:lang inheritance
- fixed multiple bugs tracking xml:base URI, one for documents that don’t define one explicitly and one for documents that define an outer and an inner xml:base that goes out of scope before the end of the document
Universal Feed Parser 3.0b3 was released on January 23, 2004.
- parse entire feed with real XML parser (if available)
- added several new supported namespaces
- fixed bug tracking naked markup in description
- added support for enclosure
- added support for source
- re-added support for cloud which got dropped somehow
- added support for expirationDate
Universal Feed Parser 3.0b2 and 3.0b1 have been lost in the mists of time.