Universal Feed Formatter 0.3
The Universal Feed Formatter (or just "feedformatter") is a simple Python module for transforming a dictionary-based structure of feed and item information into a number of valid feed formats. Currently supported formats are RSS 1.0, RSS 2.0 and Atom 1.0. You can think of it as the opposite of the well known and excellent Universal Feed Parser (in terms of what it does, not of how high quality it is - yet!).
feedformatter is in an "alpha" state - it does not currently support all of the features for each of the formats it can produce output for. It has undergone minimal testing and there is room for substantial code clean up. It has been released in this form in line with the "release early, release often" philosophy of free software development. This said, I do use it to generate the feeds for this website, and the feeds it generates do survive the relevant W3C feed validators. In short: it's not fantastic, but it's not absolutely terrible either.
- Requirements
- License
- Download
- Detailed Description
- Complete instructions
- Old Versions / History
- Contribute
Requirements
feedformatter uses ElementTree library to produce well formed XML output. The ElementTree library is part of the Python standard library as of Python 2.5. However, ElementTree can be downloaded separately for earlier earlier versions. Supposedly it will work with any version after 1.5.2, which should cover any Python installation you can find today.
If you use a version of Python earlier than 2.0, you will not be able to "pretty print" your feeds, i.e. they will be devoid of line breaks and indentation.
License
feedformatter is distributed under a standard 3-clause BSD license. It's as free as software gets.
Download
Download the latest version of feedformatter, feedformatter 0.3, which was released on March 4, 2008.
You can also download older versions of feedformatter, and see what has changed between versions, in the Old Versions / History section.
Detailed Description
When I decided that it was high time I added some feeds to my website (in the hopes of actually attracting some regular visitors!), I began searching for free Python tools related to the RSS and Atom feed formats. I found that while parsing RSS and Atom feeds with Python was a closed problem - the Universal Feed Parser being a thoroughly tested and loved solution that was everybody's first port of call - generating these feeds had not received as thorough attention. I found a few libraries which would generate just RSS 2.0 feeds or just Atom feeds, but there was no universal solution. I was envisaging a library with which I could organise the content of a feed into a dictionary or two, and then transform that dictionary into a feed in whichever format I liked, with the appropriate dictionary elements for each format picked out and renamed and reformatted as appropriate.
I could find no such thing on the web, so I wrote this feedparser module. It is supposed to be the hypothetical tool that I was searching for described above.
Feedformatter is supposed to be easy to use and forgiving. It recognises lots of different names for the same thing, the idea being that if you already have a home brewed solution to generating RSS feeds or Atom feeds, you can easily replace the feed formatting part with feedformatter; it won't complain if things are called "items" (RSS terminology) instead of "entries" (Atom terminology), or vice versa, or if publication times are passed in as 9 part tuples instead of seconds since the epoch. Feedformatter should "just work".
Complete Instructions
The feedparser module contains just one class definition that you need to worry
about in typical usage - the Feed
class. A Feed object represents, funnily
enough, a complete feed. It has two attributes, a dictionary and a list. The
feed
dictionary attribute contains values specific to the feed overall. It
may contain values for the following keys:
title
- The title of the feeddesc
- A description of the feedlink
- A URL for either the feed itself or the relevant website's homepageauthor
- The name of the author of the items in the feedpublished
- The date and time at which the feed was last updated. This may be in the form of seconds since the Unix Epoch (as a float or a string representation of float) or as a 9-part time tuple.
The items
list attribute should contain a list of dictionaries. Each
dictionary in this list contains values specific to one single item in the
feed. Each item dictionary may contain values for the following keys:
title
- The title of the feed itemdesc
- A description of the feed itemlink
- A URL for the page or file the feed item is aboutpubDate
- The date and time at which this item was added to the feed, in one of the same formats as the ''published'' attribute of the ''feed'' dictionary.
The feed
and items
attributes should be passed in as the only two
arguments to the Feed class' constructor to get a populated Feed object.
There is also a factory function named fromUFP
. The goal of this function is
to take as input a dictionary which is the output of the excellent Universal
Feed Parser and return a properly populated Feed
object. At the moment, this half works: the factory function should certain
return a Feed object, but it may not be as completely populated as it could be.
As a test, using this feature I have combined feedparser and feedformatter to
translate the BBC's RSS 2.0 feeds into (valid) RSS 1.0 feeds. I can't
translate them to Atom because some required metadata (like author information)
is missing.
Once a Feed
object has been populated, you may call any of the following
methods:
format_rss1(filename)
format_rss2(filename)
format_atom(filename)
These methods do just what you would expect them to do - write the feed
information to a file with the given filename in the appropriate format. Any
one of these methods may raise an InvalidFeedException
if the Feed
object has
not been populated with sufficient information to produce valid output in the
format requested. For example, the Atom format specification states that there
must be either at least one "author" element within the "feed" element OR at
least one "author" element within each "entry" element. If your input
dictionaries do not have the relevant keys to make this happen, the
format_atom
method will complain appropriately.
If you are an evil, depraved human being who is generating feeds with Python only while taking a break from torturing small children, you can call any of the formatting methods above with a validate=False keyword argument to produce output even if it cannot possibly be valid. Just don't be surprised when some program somewhere completely fails to understand your feed.
By default, formatted feeds contain no new lines or indentation. After all, the
programs that parse them aren't really going to care. If you want your feeds
to look pretty (useful for making sure they contain what you want them to
contain), you can call any of the formatting methods above with a
pretty=True keyword argument. Note that pretty printing uses the
xml.dom
module which is
only present in Python 2.0 and above.
Old versions / History
The list below details all versions of feedformatter which have ever been released, as well as summarising the changes between versions. You can also see how this website looked at the time of each release, which is handy for finding documentation for old re leases.
feedformatter 0.3 - March 4, 2008 (Download)
- Renamed
render_*
methods toformat_*
- makes a bit more sense. - Use the standard library ElementTree if it's available, only use an external library if not. In either case, use cElementTree instead of the Python version if available - this should speed things up a bit.
- Pretty printing is now optional (restoring compatibility with Python < 2.0)
- Atom formatting is now handled in the same way as RSS formatting - no visible changes to the user, just tidier and more manageable code.
- Slightly more intelligent handling of some Atom components.
- General code tidy up and website tidy up.
feedformatter 0.2 - March 3, 2008 (Download) (Website)
- Feeds are now "pretty printed" to their output files, i.e. new lines and indentation are used where appropriate.
- The
fromUFP
factory function is now partially working. - RSS 2.0 validation has been slightly improved.
feedformatter 0.1 - March 2, 2008 (Download) (Website)
The original release.
Contribute
Bug reports and suggestion for improvement of feedformatter are very welcome: just email them to me. Full credit will be given on this page for bug reports, fixes, etc. Feel free to email me even just to let me know you think feedformatter is neat, too.