Creating Atom feeds with XML::Atom::SimpleFeed

[ Perl tips index ]
[ Subscribe to Perl tips ]

Syndicated content has become an expected part of any regular news source. Users don't want the bother of checking sites every day, and mailing lists don't allow the user enough flexibility when dealing with a large amount of news. Regular publishing formats are also notoriously unfriendly to robots, providing barriers to indexing and searching.

XML based feed formats such as RSS and Atom have grown popular as ways to present news in a consistent, machine-readable fashion. These can be read by in-browser plugins such as Sage for Firefox http://sage.mozdev.org/ or by external syndication sites such as my.yahoo. Even popular websites such as facebook and jaiku.com allow the importing of blogs and other feeds via RSS and Atom.

Feeds are great if you're a consumer of content, but if you're a producer of content they can represent quite a challenge. That's where XML::Atom::SimpleFeed comes in.

There's nothing like an example to explain how things can be done, and since we've just enabled an Atom view of the Perl Tips mailing list, we can't think of a better example than our tips themselves.

Finding our content

Our tips start their live in Plain Old Documentation format (see perlpod for details) and are then rendered into plain-text for e-mail, and HTML for display on the web. Our goal was to convert this static HTML format into something that would form an automatic Atom feed.

Our first step is to find all the Perl Tips we've written. We already have these stored by date on our webserver, and getting the list is as easy as using Perl's in-built glob function with a little code to weed-out pages like index.html

We're not going to show you the exact code, but at the end of this we have an array of pathnames, relative to our webserver root. Thanks to glob sorting filenames, these are already in chronological order. The resulting list looks like this:

        my @tips = qw(
                # ...
                /tips/2007-06-18.html
                /tips/2007-07-04.html
                /tips/2007-07-30.html
        );

In order to create our feed, we need the date of the last tip published. We don't want to use the current time and date, as our feed is only going to be updated whenever a new tip is released:

        my ($updated) = ($tips[-1] =~ m{(\d{4}-\d{2}-\d{2})});
        $updated .= "T00:00:00Z";

Note that our $updated string is in the format 2007-09-26T00:00:00Z. Atom feeds tend to be rather picky about their date formats.

Creating our feed

Now, to create our feed object:

        my $feed = XML::Atom::SimpleFeed->new(
                title    => "Perl Tips",
                subtitle => "From Perl Training Australia",
                logo     => "http://perltraining.com.au/images/logo.png",
                link     => "http://perltraining.com.au/tips/",
                link     => {
                        rel  => 'self',
                        href => 'http://perltraining.com.au/tips/index.atom',
                },
                id       => "http://perltraining.com.au/tips/",
                author   => "Perl Training Australia",
                updated  => $updated,
        );

It's worth making a few notes about some of the attributes we're using at this point.

title
Your feed must have a title.
link
We've provided two link attributes. The first (which provides only a URL) is considered to be an alternate link; in other words a URL that provides a different view of the same data.

The second link has a relationship of self. The Atom draft specifies that all feeds should provide a link to where the feed can be fetched, which we do with our href.

id
The id is a required field providing a unique, unchanging identifier for this feed. This should not change even if we change the location for the Perl Tips feed. This allows systems to follow a feed even though it may be moved between different hosts and locations.
updated
Technically the updated time needs to be in a form matching that specified in RFC-3339. The most common format uses GMT and a formatted timestamp of 'YYYY-MM-DD' followed by the letter 'T' followed by 'HH:MM:SS' followed by the letter 'Z'. For example 2007-09-27T12:34:56Z .

We're not being very strict about our date and time, approximating it only to the nearest day. For more regular news items you'll want to be more accurate.

If this element is omitted, a timestamp with the current date and time is used.

Adding our entries

Now that we've made our feed object, let's start populating it with data:

        use constant MAX_ENTRIES => 5;

        foreach (1..MAX_ENTRIES) {
                last if not @tips;      # Stop if our list is empty
                my $tip = pop(@tips);   # Take our next most recent tip

                # Extract our tip's date from its name, and format
                # it into an RFC-3339 timestamp.

                my ($date) = ($tip =~ m{(\d{4}-\d{2}-\d{2})});
                $date .= "T00:00:00Z";

                # Load our tip as an HTML::Mason component, and
                # (using its name) generate a URL to the tip on our
                # website.

                my $comp = $m->fetch_comp($tip);
                my $link = "http://perltraining.com.au$tip";

                # Render the link's content, and if we can find an
                # "END_SUMMARY" comment, then snip everything past
                # that point and replace it with a 'Read more...' tag.

                my $summary = $m->scomp($tip);
                $summary =~ s{<!--\s*END_SUMMARY\s*-->.*}
                             {<p><b><a href="$link">Read more...</a></b></p>}s;

                # Add our entry to the feed.

                $feed->add_entry(
                        title   => $comp->scall_method("title"),
                        link    => $link,
                        id      => $link,
                        summary => $summary,
                        updated => $date,
                );
        }

We're using HTML::Mason for our website, which is why we can can fetch pages as components, and query them for their title and content. Rather than publishing full tips via Atom, we instead publish a number of tips and their summaries, using a simple HTML comment in the content to indicate where the summary section should end.

While most of the fields used have the same meaning as they do when creating a feed (except with an entry-level scope), it should be noted that you can use content if you're supplying your full content, and summary if you're supplying just a summary. You should try to have at least one or the other.

Printing our feed

Printing our feed is the easy part. For our tips, we just set the content type appropriately and print them:

        $r->content_type("text/xml; charset=us-ascii");
        $feed->print;

Since we're using HTML::Mason under mod_perl, we alter the content type using the apache request object ($r). How you set your content-type will depend upon the framework employed.

Caching the feed

Most feeds don't change all that often, so rather than rebuild them for every request, it's usually a good idea to cache your content.

In HTML::Mason this is as simple as adding the following to the top of our code:

        return if $m->cache_self(expire_in => '1 hour', busy_lock => '30 sec');

which caches content for an hour, and allows 30 seconds for content regeneration. While we don't show it here, our actual code sets the content-type before we do the cache check, otherwise our data could end up being served with the wrong content-type.

If you're using a different system from HTML::Mason, you may wish to consider using a module such as Cache::Cache to implement caching in an efficient manner. This is particularly important if your feed becomes popular, as you may end up with a large volume of requests.


References

XML::Atom::SimpleFeed
http://search.cpan.org/perldoc
HTML::Mason
http://www.masonhq.com/
Atom syndication format
http://rfc.net/rfc4287.txt
Internet timestamps
http://rfc.net/rfc3339.html

[ Perl tips index ]
[ Subscribe to Perl tips ]


This Perl tip and associated text is copyright Perl Training Australia. You may freely distribute this text so long as it is distributed in full with this Copyright noticed attached.

If you have any questions please don't hesitate to contact us:

Email: contact@perltraining.com.au
Phone: 03 9354 6001 (Australia)
International: +61 3 9354 6001

Valid XHTML 1.0 Valid CSS