Universal Feed Parser supports downloading and parsing password-protected feeds that are protected by HTTP authentication. Both basic and digest authentication are supported.
The easiest way is to embed the username and password in the feed URL itself.
In this example, the username is test and the password is basic.
>>> import feedparser >>> d = feedparser.parse('http://test:firstname.lastname@example.org/docs/examples/basic_auth.xml') >>> d.feed.title u'Sample Feed'
The same technique works for digest authentication. (Technically, Universal Feed Parser will attempt basic authentication first, but if that fails and the server indicates that it requires digest authentication, Universal Feed Parser will automatically re-request the feed with the appropriate digest authentication headers. This means that this technique will send your password to the server in an easily decryptable form.)
In this example, the username is test and the password is digest.
>>> import feedparser >>> d = feedparser.parse('http://test:email@example.com/docs/examples/digest_auth.xml') >>> d.feed.title u'Sample Feed'
You can also construct a HTTPBasicAuthHandler that contains the password information, then pass that as a handler to the parse function. HTTPBasicAuthHandler is part of the standard urllib2 module.
import urllib2, feedparser # Construct the authentication handler auth = urllib2.HTTPBasicAuthHandler() # Add password information: realm, host, user, password. # A single handler can contain passwords for multiple sites; # urllib2 will sort out which passwords get sent to which sites # based on the realm and host of the URL you're retrieving auth.add_password('BasicTest', 'feedparser.org', 'test', 'basic') # Pass the authentication handler to the feed parser. # handlers is a list because there might be more than one # type of handler (urllib2 defines lots of different ones, # and you can build your own) d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml', handlers=[auth])
Digest authentication is handled in much the same way, by constructing an HTTPDigestAuthHandler and populating it with the necessary realm, host, user, and password information. This is more secure than stuffing the username and password in the URL, since the password will be encrypted before being sent to the server.
import urllib2, feedparser auth = urllib2.HTTPDigestAuthHandler() auth.add_password('DigestTest', 'feedparser.org', 'test', 'digest') d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml', handlers=[auth])
The examples so far have assumed that you know in advance that the feed is password-protected. But what if you don’t know?
If you try to download a password-protected feed without sending all the proper password information, the server will return an HTTP status code 401. Universal Feed Parser makes this status code available in d.status.
Details on the authentication scheme are in d.headers['www-authenticate']. Universal Feed Parser does not do any further parsing on this field; you will need to parse it yourself. Everything before the first space is the type of authentication (probably Basic or Digest), which controls which type of handler you’ll need to construct. The realm name is given as realm=”foo” – so foo would be your first argument to auth.add_password. Other information in the www-authenticate header is probably safe to ignore; the urllib2 module will handle it for you.
>>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml') >>> d.status 401 >>> d.headers['www-authenticate'] 'Basic realm="Use test/basic"' >>> d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml') >>> d.status 401 >>> d.headers['www-authenticate'] 'Digest realm="DigestTest", nonce="+LV/uLLdAwA=5d77397291261b9ef256b034e19bcb94f5b7992a", algorithm=MD5, qop="auth"'