Build Your Own Aggregator with FeedZcollector

I’ve been playing around a bit over the past week with an application called FeedZcollector, written by brothers Xander and Fred Zelders. FeedZcollector is a Windows application that monitors RSS feeds, adding new items to either a Microsoft Access or MySQL database. FeedZcollector is the retrieval engine behind Feeds4all.com.
I’ve been using the trial version to pull feeds into Access (this free version limits you to 10 feeds; you can purchase versions of the tool that let you work with 25, 100, 1000, or an unlimited number of feeds).
The tool simply pulls down the latest items from the feeds you enter and stores them (title, abstract, URL, time loaded, etc.) in an Access (or MySQL) database. What you do with them from that point forward is up to you. The Zelders purposefully designed the tool to be a component of something larger — but a component that could be used together with other applications.
In my testing, FeedZcollector did its job well, pulling feeds into Access soon after the source site updated the feed. Being able to construct fielded searches, or to augment entries with other data generated by my hypothetical site’s users (tags, times viewed, etc. — anything that could be recorded in a data structure) makes it a powerful back-end tool for repurposing content.
As a die-hard Mac/Unix guy, I wish there were a version of the software that would run in a Unix/Linux environment. However, according to Fred Zelders,

We (Xander and me) are sorry but we have no plans to make versions for Mac OS X or Unix. The reason is the lack of Mac OS X and/or Unix development skills.
There is a possibility however to run FeedZcollector on a Mac (OS X) by installing and running FeedZcollector under Parallels. (I’m running FeedZcollector myself at this very moment on my iMac 20″)

FeedZcollector is a useful foundation for building your own aggregator without having to rely on external, Internet-based, services such as RSSMix or Sphere that provide aggregation services with or without keyword filtering.

Introducing WOMBLINK — Word Of Mouth Blog LINK

UPDATE (9 March 2012): I’ve disabled this tool as it ended up being a honeypot for spammers.
The discussion in the comments section of my most recent post prompted me to do a bit of coding. It struck me that libraries needed a tool to help encourage their patrons to blog about the library. And not just to encourage talk — to actively invite comment on particular web pages (describing events, book talks, policy changes, etc.) Weblogs may well be the most powerful world of mouth tool in a library’s Internet arsenal.
The result is a simple tool I’ve called Word of Mouth Blog LINK — WOMBLINK for short.
The concept is straightforward. A WOMBLINK is a link provided by a library web site directly back to a specific web page. It is designed to be included in weblogs and is meant to be drop-dead easy for the librarian and patron to use, requiring nothing more than copying and pasting for the site publisher or the blogger.
So what is it? A WOMBLINK is two lines of HTML that, when included on a web page, display the words “Blog This”. A prospective blogger can click on this link and receive a second short snippet of HTML that includes a link directly to the original web page as well as a small logo provided by the site owner.
So what does it look like? Well, if I wanted to make it easy for people to blog about this article on RSS4Lib, I would go to WOMBLINK and fill in a form. This would generate the following HTML:
<a href = ‘http://www.rss4lib.com/womblink/display.pl?id=103’>Blog This</a><br>
WOMBLINK provided by <a href = ‘http://www.rss4lib.com/womblink/’>RSS4Lib</a>
That code looks like this in the browser:
Blog This
WOMBLINK provided by RSS4Lib
Then, as a blogger wanting to comment on and link directly to this web page, I would click the “Blog This” WOMBLINK above and get the following bit of HTML code:
<a href = ‘http://www.rss4lib.com/’><img src =
‘http://www.rss4lib.com/images/RSS4Lib-Logo.jpg’ height = ’20’
border = ‘0’ alt = ‘Link to RSS4Lib’><a> <a href =
‘http://www.rss4lib.com/’>RSS4Lib</a><br><font
size = ‘1’>This link courtesy <a href =
‘http://www.rss4lib.com/womblink/’>RSS4Lib WOMBLINK</a></font>
Once copied and pasted into a blog, it looks like this — complete with a logo for the web site being blogged:
Link to RSS4Lib RSS4Lib
This link courtesy RSS4Lib WOMBLINK
While some blogging software packages offer a JavaScript bookmarklet to “blog this,” bookmarklets aren’t always that useful. The blogger may not be technically savvy enough to install it or may not be working at her own computer when she sees your web site. It makes more sense to use a software-independent tool that all bloggers can take advantage of. An added plus of WOMBLINK is that the source web site can provide its own graphic (as long as it works well 25 pixels tall!) to help reinforce that web site’s identity.
Let me know what you think of this tool — send your comments and feedback to me at womblink@rss4lib.com.

Making Viral Advertising Easy

Jill Stover of Library Marketing – Thinking Outside the Book (a great source of library marketing ideas, by the way), wrote about a handy feature added to the Engineering Village 2 database. Once you’re in the database and viewing the abstract of an article, there’s a link to “blog this”. That link, when clicked, gives you a snippet of code to put into your weblog.
The code EV2 provides gives the title of the article and a graphic for EV2. Clicking on either will bring you through your library’s proxy server to the full text. (This example will work if you have access to Tufts University’s proxy server, but not for anyone else…):

Jill notes how useful this functionality is for librarians who want to highlight tools available to their patrons. I take this one step further: why not have a link to “blog this” appear on any relevant portion of the library site? From a change in hours to a new exhibit in the library lobby to other news, events, or information of note — make it easy for your patrons to link to the source of the information when they are blogging.

Measuring RSS Usage

UPDATE 10 January 2010: Use the technique described here on your own RSS feeds at YourStats, an RSS4Lib tool. Upload your web server’s log file and it will provide the number of readers based on the log file.
A thread on Web4Lib about measuring RSS usage through web logs made me realize how tricky this is. Aggregators (and browsers, such as Firefox, Safari, and IE 7) all request RSS feeds from your server, often several times a day. It is hard to tell how much your feed is being used — the RSS feed for this blog, http://www.rss4lib.com/index.xml, was accessed 19,455 times in October. Which sounds impressive, right?
However, that means that a constellation of individual web browsers, news aggregators, and search engines was checking the feed once a day, once an hour, once a week… Or at some frequency.
I know how many Bloglines subscribers there are (334 as of right now). But I can’t keep track of how many are reading this through Yahoo!, Google, NewsGator online, or this, that or the other aggregator.
Looking at the detailed web server log report (which is generated by my host using Analog), I see that some aggregators add the number of users they are collecting data for — basically, a subscription report. So I can see, in a recent month, the following details:

Bloglines/3.1 (http://www.bloglines.com; 320 subscribers)
NewsGatorOnline/2.0 (http://www.newsgator.com; 7 subscribers)
AttensaOnline/1.0 (http://www.attensa.com; 1 subscribers)
Feedshow/1.0 (http://www.feedshow.com; 1 subscriber)

(This is for several different feeds for several different RSS services — taken from my entire server report, not just for the main feed for RSS4Lib.)
I also see the hits for all of one kind of web browser — in this example, Safari, get lumped together as one browser type, “AppleSyndication/54”. Different versions of Safari have different browser types, so I also see “AppleSyndication/53”, for example.
In short, it’s very hard to gauge readership — to separate reads from aggregator or browser “are you updated” hits. This is doubly true since so many people, myself included, read the full text of a post within the aggregator and rarely click through to the site where “spider” hits and “user” hits can be separated, mostly, by a good web log analyzer application.
P.S. I find this amazing, but this is the 100th post on my blog…. Happy “centennial” to me!
Update — 21 Feb 2007 Google Reader’s crawler, Feedfetcher-Google, now includes a subscriber count when it grabs your RSS feed. In my log file, a sample line looks like Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 133 subscribers; feed-id=1495776793707971617). Thanks to Taming the Beast for this tip.

Feed2JS and Spam

Feed2JS is a great tool for reusing RSS feeds on web pages. (See my May 2005 post, It’s Not Stealing, It’s Syndicating, for an overview.)
However — there’s always a ‘however,’ isn’t there — there is a fixable problem. If you run your own copy of Feed2JS on your own server (rather than using Feed2JS’s public version), unscrupulous folks can borrow your script — and your bandwidth — to repurpose other RSS feeds from other sites without your knowledge or permission.
I learned this the hard way when a copy of Feed2JS I manage at my workplace was “borrowed” by someone who was running a fake weblog designed to sell Google ads; the owner of this revenue-driven site was borrowing feeds from other blogs and using my copy of Feed2JS to reproduce them on his site. I was the unwitting intermediary in an unscrupulous, and possibly illegal, reuse of content. (Ironically, I was first made aware of this use of my copy of Feed2JS when another individual else whose commercial site devoted to hair-loss remedies complained to me that my Feed2JS was misappropriating his weblog content on a competitor’s blog…)
So how do you tell if your own version of Feed2JS has been borrowed? Look in the feed2js/magpie/cache/ and feed2js/magpie/cache_utf8 directories. There should be one file in the cache directory for each feed you use. The files have inscrutable names like “ad1cb3ddb313d3f10f9b7d50ec8da638.” There will be one for each RSS feed your script is monitoring. If you use Feed2JS to monitor three RSS feeds, there will be three files in the cache directory. If there are more files than there should be, your script has likely been borrowed.
Feed2JS.org offers directions for restricting Feed2JS to the feeds you want to be reused. With a bit of extra tinkering with the PHP, you can allow feeds from more than one server to be repurposed through your script.

Pageflakes & Library Feeds

Pageflakes offers a service something like Yahoo! and Google — mix and match the content you want to see on a single web page. You can keep it private, share them with a group of people you select, or make them public.
Once set up, the page lists the blog or feed title with a number (of unread items) to the right; click the number and see the list of headlines. Click a title, and go to the source. Very slick and AJAXy.
A good starting place for exploring Pageflakes is the public list of Librarian Weblogs maintained by Phlilip Bradley. Pageflakes looks like a good tool for creating ad hoc feedliographies. Pull a bunch of related feeds together, publish them as a public feed, and direct your patrons to them.
Note that Safari users are out in the cold; this site works well with Firefox, though.

For Whom the RSS Feeds

“E-Mail is for Old People.” That’s the title of an article appearing by Dan Carnavale in the October 6 issue of The Chronicle of Higher Education. (The article is currently available without registration — as of October 2.)
Carnavale notes that many undergraduate students have moved on to newer, communications media — instant messaging, text messaging via cell phone, and web 2.0 sites like Facebook and MySpace. He notes that,

A 2005 report from the Pew Internet and American Life Project called “Teens and Technology” found that teenagers preferred new technology, like instant messaging or text messaging, for talking to friends and use e-mail to communicate with “old people.”

Newer, trendier — or perhaps just plain better — technologies have the attention of undergraduates and their juniors. Some schools have created a quasi-official presence in MySpace or Facebook and maintain it with much of the information that might have been exclusively posted by email a year or two ago.
RSS is not mentioned in this item; it’s a bit different a beast, admittedly. But the article got me thinking: just who is reading all my carefully constructed RSS feeds anyway? If RSS is a significant chunk of your library’s public relations and announcement effort, is it effective — particularly if the generation of people that seem natural users of it happen to see RSS as too unidirectional and “email-like.”
When I look at the server statistics on this blog, or on my library’s blog, I see lots and lots of hits from aggregators and search engines. And lots for Magpie, which I use with Feed2JS to reprint announcement headlines on my library’s home page. While some aggregators are kind enough to tell me that they’re acting on behalf of so many subscribers (sadly, that’s “so many” is far too often ‘1’ when it’s RSS4Lib, and I know that the aggregator is toiling away for me alone, a remnant of my exploring aggregators using my own feed), the hits-per-subscriber ratio assumes I publish more frequently — a LOT more frequently – -than I do.
Perhaps because it so darned easy to create an RSS feed out of almost any source — blogs and wikis, of course, but also content management systems, databases, you name it — and because it is so flexible, RSS is destined to fade into the background, just another piece of the infrastructure of the information age. Yet the promise of being able to skim and dip one’s intellectual toes into the information stream makes it more valuable than it seems. Ask not for whom RSS feeds, for it feeds for you…

ZapTXT — RSS to You

ZapTXT (a beta product — but aren’t they all?) is a new service that lets you set up a keyword search of specific RSS feeds and send you an alert — by email, instant messenger, or text message to a mobile device — when those keywords appear in that feed. ZapTXT provides a list of popular news feeds (for example, Technology contains about 20 pre-selected feeds, including Engadget, Pogue’s Posts, Resource Shelf, and more; Political Blogs contains Wonkette, Daily Kos, and a bunch of others). You can pick multiple sites using the preselected lists. Alternately, you can specify your own favorite feed source. To add multiple personally selected sources, first create the feed, then edit it to add additional RSS sources.
Email alerts go to any email address. IM alerts only go to Jabber, Gtalk and MSN clients — leaving out AOL’s instant messenger. Test messaging is available for all major cell service providers.
With a carefully constructed set of keywords, this is another great clipping service substitute.
Addendum: Sameer Patel of ZapTXT sent me the following helpful tip — a simple way to search the ENTIRE blogosphere for a keyword. In his words:

Go to Sphere.com.
Enter any search term
Throw the RSS feed for the Sphere results page into ZapTXT as a ZapTask.
You are now monitoring a search term across the entire blogsphere. And if you select “as they appear” when you’re setting up your ZapTask, that’s exactly what happens. With this method, you’re monitoring the entire post of all blogs that Sphere catches. So if ZapTXT showed up deep in the body of the post, the RSS feed from Sphere catches that as part of the result and you get a ZapTXT alert.
[Via LISNews.]

Clipping Service on the Cheap

This may be of benefit to, primarily, special librarians, but it’s worth a thought for any librarian wishing to make a positive impression on whatever group or person is responsible for funding… David Rothman, in his blog focusing on medical librarianship, notes how easy it is to provide a quality current awareness service to one’s organization. A simple search at a news aggregator (that is, an aggregator that actually handles just “official” news sources, not the broader blogosphere) can populate a web page with recent headlines and links to the full-text articles.
Rothman recommends FeedGit, which aggregates these “official” news sources. Enter a search term. You’ll see a list of news providers grouped by type (news, web, blogs, images, etc.). For each content type, there are links to an RSS feed specifically on your search term at each of the providers.
Putting this feed on a web page is the next step that Rothman notes — don’t even bother the decision makers with the raw RSS (unless, of course, they’ve already joined that bandwagon). User your favorite RSS-to-HTML script (mine is Feed2JS), tailor the style to match your own site, and tell the world (or the individual) that it’s there. Voilà! A quick-and-dirty clipping service.