Changes in earlier versionsΒΆ
Universal Feed Parser began as an “ultra-liberal RSS parser” named rssparser.py. It was written as a weapon for battles that no one remembers, to work around problems that no longer exist.
Ultra-liberal Feed Parser 2.5.3 was released on August 3, 2003.
- track whether we’re inside an image or textInput (TvdV)
- return the character encoding, if specified
Ultra-liberal Feed Parser 2.5.2 was released on July 28, 2003.
- entity-decode inline XML properly
- added support for inline <xhtml:body> and <xhtml:div> as used in some RSS 2.0 feeds
Ultra-liberal Feed Parser 2.5.1 was released on July 26, 2003.
- clear
opener.addheadersso we only send our customUser-Agent(otherwiseurllib2sends two, which confuses some servers) (RMK)
Ultra-liberal Feed Parser 2.5 was released on July 25, 2003.
- changed to Python license (all contributors agree)
- removed unnecessary
>urllibcode –urllib2should always be available anyway - return actual
url,status, and full HTTP headers (asresult['url'],result['status'], andresult['headers']) if parsing a remote feed over HTTP. This should pass all the Aggregator client :abbr:`HTTP (Hypertext Transfer Protocol) tests <http://diveintomark.org/tests/client/http/>`_. - added the latest namespace-of-the-week for RSS 2.0
Ultra-liberal Feed Parser 2.4 was released on July 9, 2003.
- added preliminary Pie/Atom/Echo support based on Sam Ruby’s snapshot of July 1
- changed project name
Ultra-liberal RSS Parser 2.3.1 was released on June 12, 2003.
- if item has both link and guid, return both as-is
Ultra-liberal RSS Parser 2.3 was released on June 11, 2003.
- added
USER_AGENTfor default (if caller doesn’t specify) - make sure we send the
User-Agenteven ifurllib2isn’t available - Match any variation of
backend.userland.com/rssnamespace
Ultra-liberal RSS Parser 2.2 was released on January 27, 2003.
- added attribute support and admin:generatorAgent. start_admingeneratoragent is an example of how to handle elements with only attributes, no content.
Ultra-liberal RSS Parser 2.1 was released on November 14, 2002.
- added gzip support
Ultra-liberal RSS Parser 2.0.2 was released on October 21, 2002.
- added the
inchannelto theifstatement, otherwise it’s useless. Fixes the problem JD was addressing by adding it. (JB)
Ultra-liberal RSS Parser 2.0.1 was released on October 21, 2002.
- changed
parse()so that if we don’t get anything because ofetag/modified, return the oldetag/modifiedto the caller to indicate why nothing is being returned
Ultra-liberal RSS Parser 2.0 was released on October 19, 2002.
- use
inchannelto watch out for image and textinput elements which can also contain title, link, and description elements (JD) - check for isPermaLink=’false’ attribute on guid elements (JD)
- replaced
openAnythingwithopen_resourcesupportingETagandIf-Modified-Sincerequest headers (JD) parsenow acceptsetag,modified,agent, andreferreroptional arguments (JD)- modified
parseto return a dictionary instead of a tuple so that anyetagormodifiedinformation can be returned and cached by the caller
Ultra-liberal RSS Parser 1.1 was released on September 27, 2002.
- fixed infinite loop on incomplete CDATA sections
Ultra-liberal RSS Parser 1.0 was released on September 27, 2002.
- fixed namespace processing on prefixed RSS 2.0 elements
- added Simon Fell’s namespace test suite
Ultra-liberal RSS Parser was first released on August 13, 2002.
Aaron Swartz has been looking for an ultra-liberal RSS parser. Now that I’m experimenting with a homegrown RSS-to-email news aggregator, so am I. You see, most RSS feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register’s feed most days). Or just a bastardized mix of RSS 0.9x elements with RSS 1.0 elements (Movable Type feeds).
Then there are feeds, like Aaron’s feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid RSS 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in RSS 0.94 (see Dave Winer’s feed for an example). And then there’s Jon Udell’s feed, with the fullitem element that he just sort of made up.
rssparser.py. GPL-licensed. Tested on 5000 active feeds.