ETag and Last-Modified Headers¶
ETags and Last-Modified headers are two ways that feed publishers can save bandwidth, but they only work if clients take advantage of them. Universal Feed Parser gives you the ability to take advantage of these features, but you must use them properly.
The basic concept is that a feed publisher may provide a special
HTTP header, called an ETag, when it
publishes a feed. You should send this ETag back to the server on subsequent
requests. If the feed has not changed since the last time you requested it,
the server will return a special HTTP
status code (304
) and no feed data.
Using ETags to reduce bandwidth¶
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d.etag
'"6c132-941-ad7e3080"'
>>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', etag=d.etag)
>>> d2.status
304
>>> d2.feed
{}
>>> d2.entries
[]
>>> d2.debug_message
'The feed has not changed since you last checked, so
the server sent no data. This is a feature, not a bug!'
There is a related concept which accomplishes the same thing, but slightly
differently. In this case, the server publishes the last-modified date of the
feed in the HTTP header. You can send
this back to the server on subsequent requests, and if the feed has not
changed, the server will return HTTP
status code 304
and no feed data.
Using Last-Modified headers to reduce bandwidth¶
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d.modified
Fri, 11 Jun 2012 23:00:34 GMT
>>> d.modified_parsed
(2004, 6, 11, 23, 0, 34, 4, 163, 0)
>>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', modified=d.modified)
>>> d2.status
304
>>> d2.feed
{}
>>> d2.entries
[]
>>> d2.debug_message
'The feed has not changed since you last checked, so
the server sent no data. This is a feature, not a bug!'
Clients should support both ETag and Last-Modified headers, as some servers support one but not the other.
Important
If you do not support ETag and Last-Modified headers, you will repeatedly download feeds that have not changed. This wastes your bandwidth and the publisher’s bandwidth, and the publisher may ban you from accessing their server.
Note
You can control the behaviour of HTTP
caches between your application and the origin server by using the
extra_headers
parameter. For example, you may want to send
Cache-control: max-age=60
to make the caches revalidate against the
origin server unless their cached copy is less than a minute old. Again,
this should be used with consideration.