Changes in earlier versionsΒΆ
Universal Feed Parser began as an “ultra-liberal RSS parser” named rssparser.py
. It was written as a weapon for battles that no one remembers, to work around problems that no longer exist.
Ultra-liberal Feed Parser 2.5.3 was released on August 3, 2003.
- track whether we’re inside an image or textInput (TvdV)
- return the character encoding, if specified
Ultra-liberal Feed Parser 2.5.2 was released on July 28, 2003.
- entity-decode inline XML properly
- added support for inline <xhtml:body> and <xhtml:div> as used in some RSS 2.0 feeds
Ultra-liberal Feed Parser 2.5.1 was released on July 26, 2003.
- clear
opener.addheaders
so we only send our customUser-Agent
(otherwiseurllib2
sends two, which confuses some servers) (RMK)
Ultra-liberal Feed Parser 2.5 was released on July 25, 2003.
- changed to Python license (all contributors agree)
- removed unnecessary
>urllib
code –urllib2
should always be available anyway - return actual
url
,status
, and full HTTP headers (asresult['url']
,result['status']
, andresult['headers']
) if parsing a remote feed over HTTP. This should pass all the Aggregator client :abbr:`HTTP (Hypertext Transfer Protocol) tests <http://diveintomark.org/tests/client/http/>`_. - added the latest namespace-of-the-week for RSS 2.0
Ultra-liberal Feed Parser 2.4 was released on July 9, 2003.
- added preliminary Pie/Atom/Echo support based on Sam Ruby’s snapshot of July 1
- changed project name
Ultra-liberal RSS Parser 2.3.1 was released on June 12, 2003.
- if item has both link and guid, return both as-is
Ultra-liberal RSS Parser 2.3 was released on June 11, 2003.
- added
USER_AGENT
for default (if caller doesn’t specify) - make sure we send the
User-Agent
even ifurllib2
isn’t available - Match any variation of
backend.userland.com/rss
namespace
Ultra-liberal RSS Parser 2.2 was released on January 27, 2003.
- added attribute support and admin:generatorAgent. start_admingeneratoragent is an example of how to handle elements with only attributes, no content.
Ultra-liberal RSS Parser 2.1 was released on November 14, 2002.
- added gzip support
Ultra-liberal RSS Parser 2.0.2 was released on October 21, 2002.
- added the
inchannel
to theif
statement, otherwise it’s useless. Fixes the problem JD was addressing by adding it. (JB)
Ultra-liberal RSS Parser 2.0.1 was released on October 21, 2002.
- changed
parse()
so that if we don’t get anything because ofetag
/modified
, return the oldetag
/modified
to the caller to indicate why nothing is being returned
Ultra-liberal RSS Parser 2.0 was released on October 19, 2002.
- use
inchannel
to watch out for image and textinput elements which can also contain title, link, and description elements (JD) - check for isPermaLink=’false’ attribute on guid elements (JD)
- replaced
openAnything
withopen_resource
supportingETag
andIf-Modified-Since
request headers (JD) parse
now acceptsetag
,modified
,agent
, andreferrer
optional arguments (JD)- modified
parse
to return a dictionary instead of a tuple so that anyetag
ormodified
information can be returned and cached by the caller
Ultra-liberal RSS Parser 1.1 was released on September 27, 2002.
- fixed infinite loop on incomplete CDATA sections
Ultra-liberal RSS Parser 1.0 was released on September 27, 2002.
- fixed namespace processing on prefixed RSS 2.0 elements
- added Simon Fell’s namespace test suite
Ultra-liberal RSS Parser was first released on August 13, 2002.
Aaron Swartz has been looking for an ultra-liberal RSS parser. Now that I’m experimenting with a homegrown RSS-to-email news aggregator, so am I. You see, most RSS feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register’s feed most days). Or just a bastardized mix of RSS 0.9x elements with RSS 1.0 elements (Movable Type feeds).
Then there are feeds, like Aaron’s feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid RSS 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in RSS 0.94 (see Dave Winer’s feed for an example). And then there’s Jon Udell’s feed, with the fullitem element that he just sort of made up.
rssparser.py
. GPL-licensed. Tested on 5000 active feeds.