Measuring RSS Usage

UPDATE 10 January 2010: Use the technique described here on your own RSS feeds at YourStats, an RSS4Lib tool. Upload your web server’s log file and it will provide the number of readers based on the log file.
A thread on Web4Lib about measuring RSS usage through web logs made me realize how tricky this is. Aggregators (and browsers, such as Firefox, Safari, and IE 7) all request RSS feeds from your server, often several times a day. It is hard to tell how much your feed is being used — the RSS feed for this blog, http://www.rss4lib.com/index.xml, was accessed 19,455 times in October. Which sounds impressive, right?
However, that means that a constellation of individual web browsers, news aggregators, and search engines was checking the feed once a day, once an hour, once a week… Or at some frequency.
I know how many Bloglines subscribers there are (334 as of right now). But I can’t keep track of how many are reading this through Yahoo!, Google, NewsGator online, or this, that or the other aggregator.
Looking at the detailed web server log report (which is generated by my host using Analog), I see that some aggregators add the number of users they are collecting data for — basically, a subscription report. So I can see, in a recent month, the following details:

Bloglines/3.1 (http://www.bloglines.com; 320 subscribers)
NewsGatorOnline/2.0 (http://www.newsgator.com; 7 subscribers)
AttensaOnline/1.0 (http://www.attensa.com; 1 subscribers)
Feedshow/1.0 (http://www.feedshow.com; 1 subscriber)

(This is for several different feeds for several different RSS services — taken from my entire server report, not just for the main feed for RSS4Lib.)
I also see the hits for all of one kind of web browser — in this example, Safari, get lumped together as one browser type, “AppleSyndication/54”. Different versions of Safari have different browser types, so I also see “AppleSyndication/53”, for example.
In short, it’s very hard to gauge readership — to separate reads from aggregator or browser “are you updated” hits. This is doubly true since so many people, myself included, read the full text of a post within the aggregator and rarely click through to the site where “spider” hits and “user” hits can be separated, mostly, by a good web log analyzer application.
P.S. I find this amazing, but this is the 100th post on my blog…. Happy “centennial” to me!
Update — 21 Feb 2007 Google Reader’s crawler, Feedfetcher-Google, now includes a subscriber count when it grabs your RSS feed. In my log file, a sample line looks like Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 133 subscribers; feed-id=1495776793707971617). Thanks to Taming the Beast for this tip.

Feed2JS and Spam

Feed2JS is a great tool for reusing RSS feeds on web pages. (See my May 2005 post, It’s Not Stealing, It’s Syndicating, for an overview.)
However — there’s always a ‘however,’ isn’t there — there is a fixable problem. If you run your own copy of Feed2JS on your own server (rather than using Feed2JS’s public version), unscrupulous folks can borrow your script — and your bandwidth — to repurpose other RSS feeds from other sites without your knowledge or permission.
I learned this the hard way when a copy of Feed2JS I manage at my workplace was “borrowed” by someone who was running a fake weblog designed to sell Google ads; the owner of this revenue-driven site was borrowing feeds from other blogs and using my copy of Feed2JS to reproduce them on his site. I was the unwitting intermediary in an unscrupulous, and possibly illegal, reuse of content. (Ironically, I was first made aware of this use of my copy of Feed2JS when another individual else whose commercial site devoted to hair-loss remedies complained to me that my Feed2JS was misappropriating his weblog content on a competitor’s blog…)
So how do you tell if your own version of Feed2JS has been borrowed? Look in the feed2js/magpie/cache/ and feed2js/magpie/cache_utf8 directories. There should be one file in the cache directory for each feed you use. The files have inscrutable names like “ad1cb3ddb313d3f10f9b7d50ec8da638.” There will be one for each RSS feed your script is monitoring. If you use Feed2JS to monitor three RSS feeds, there will be three files in the cache directory. If there are more files than there should be, your script has likely been borrowed.
Feed2JS.org offers directions for restricting Feed2JS to the feeds you want to be reused. With a bit of extra tinkering with the PHP, you can allow feeds from more than one server to be repurposed through your script.

Pageflakes & Library Feeds

Pageflakes offers a service something like Yahoo! and Google — mix and match the content you want to see on a single web page. You can keep it private, share them with a group of people you select, or make them public.
Once set up, the page lists the blog or feed title with a number (of unread items) to the right; click the number and see the list of headlines. Click a title, and go to the source. Very slick and AJAXy.
A good starting place for exploring Pageflakes is the public list of Librarian Weblogs maintained by Phlilip Bradley. Pageflakes looks like a good tool for creating ad hoc feedliographies. Pull a bunch of related feeds together, publish them as a public feed, and direct your patrons to them.
Note that Safari users are out in the cold; this site works well with Firefox, though.

Forthcoming Book on RSS and Libraries

A book that Information Today will publish on October 31 looks very interesting. It’s Blogging and RSS: A Librarian’s Guide by Michael Sauers. While I haven’t read the book yet (I’m not on anybody’s PR list…), the abstract at Amazon makes it look very interesting:

Libraries increasingly use blogs and RSS feeds to reach out to users, while librarians blog daily on a range of personal and professional topics. The way has been paved by the tech-savvy and resource-rich, but any library or librarian can successfully create and syndicate a blog today. In this readable book, author, Internet trainer, and blogger Michael P. Sauers, M.L.S., shows how blogging and RSS technology can be easily and effectively used in the context of a library community. Sauers showcases interesting and useful blogs, shares insights from librarian bloggers, and offers step-by-step instructions for creating, publishing, and syndicating a blog using free Web-based services, software, RSS feeds, and aggregators.

I’ve preordered it… And will post a review here once I’ve read it.

Will 2007 Be the Year of RSS?

Is RSS on the cusp of moving from a neat tool for the geeks among us to a central part of Internet life? Richard MacManus of (“Read/Write Web“) answers in the affirmative in a post titled “2007 Will Be A Big Year For RSS. MacManus posits that enough major players have now made RSS a part of their tools — Microsoft’s IE 7 and Outlook 2007, Yahoo!’s webmail, MySpace, Safari, Firefox, and many others — that RSS will have a “break out” year.
This makes sense. Once users integrate a tool into their daily life — or the applications they use do that for them — that tool becomes akin to a utility in the physical world. It doesn’t matter whether or not the users know what they’re using. Most of us reading this blog, most of the time, take running water, electricity, and landline telephone service for granted. They are simply there, no questions asked. RSS seems to me to be moving from being a handy tool to being infrastructure, the glue that holds many disparate information services together.
As the tools our patrons use to interact with the online world adopt RSS, the more important it is that services libraries offer are at least capable of distributing information via RSS. There’s not a database or information service out there that couldn’t have a what’s new service (by RSS, by email, by passenger pigeon) — what’s new in terms of data in the database, and what’s new in terms of what the patron can do with the database as a tool. Once our user communities have tools that allow them to access RSS without a second thought, they will only notice it when it’s not there.

[Thanks to BlogBridge for pointing me to Read/Write Web, a blog I had missed until now.]

Is the RSS World Flat?

Paul Pival, the Distant Librarian, brings up an interesting question in his recent post “Just what am I looking at?“. In his post, he notes:

I think students who have only researched through their computer monitor have a very hard time understanding what they’re looking at. Through the monitor, a page is a page is a page, whether it be from a scholarly journal, a book, Newsweek, a website, a chat window… There are almost none of the visual clues that are present in a more traditional physical piece of information that might make it easier to tell if you’re about to use a scholarly publication or a piece of crap in your paper. If I’ve got a PDF from Academic Search Premier and I don’t recognize the name of the publication and there are no ads on the page, surely it’s scholarly, right?”

As does much of what Peter writes, this got me to thinking. If this is a real problem with online research — and I agree it is; many of my graduate student patrons at my library seem not to have learned the difference between authoritative and non-authoritative online sources they find through Google — then I wonder what the consequences of staying on top of things via a search in an aggregator might be? An RSS feed, especially one that is a search result, provides precious little context in which to judge the authority of the source. It’s sort of like deep linking into a web site to find the print-only, stripped-of-graphics, stripped-of-author version of a page. The impatient researcher (i.e., almost anyone with a deadline of, say, tomorrow) will grab the URL and take the work as it is.
The problem of recognizing “authoritative” content is, of course, nothing novel; I imagine when I was back in middle school and assigned a “research” paper whose requirements were that I find at least three different sources from the Readers Guide to Periodical Literature that I was none too picky about which three I picked. I got my three, wrote my page or two, and moved on. I like to think my research techniques improved in college and graduate school. But I also was doing that work just at the dawn of the online age; yes, there were databases, but no, there were relatively few full-text online journals accessible to me, so I largely relied on what was in the stacks and available to me, not what was truly “good.”
So I ask myself, what could I, as a blogger, put in an RSS feed that might provide someone reading it with a sense of my “authority” (if, that is, I actually have any)? Yes, each post links to the web site, and the collection of items I’ve written. And from there, it’s just a click to a web site that tells the casual reader more about me than I probably ought to let them know. Is the provision of such links, probably to be used only by the engaged researcher, enough?
Perhaps there should be some way of rating a web author as authoritative (or popular, authority’s online proxy). This seems a similar problem to content ratings systems like the W3C’s PICS rating system was designed to solve. (PICS is a standard for saying how child-safe a particular site or page is, but has broader applications as away to apply labels to content. These labels are “controlled” by some organization, so a label contains both the label and a link to a page that defines what the label means.) Should RSS items come with a DIGG or Technorati rating in their header that could be displayed in an aggregator or used as a filter, set to a default of some positive score for those who choose not to customize their preferences?

Medical Feeds Collection

There’s a comprehensive, subject-based listing of medical and health RSS feeds. This advertising-supported site provides subject listings of RSS feeds, covering medical journals and news sites. If you’re looking to build a “what’s new” service based on medical feeds, this is a good site to mine for sources.

For Whom the RSS Feeds

“E-Mail is for Old People.” That’s the title of an article appearing by Dan Carnavale in the October 6 issue of The Chronicle of Higher Education. (The article is currently available without registration — as of October 2.)
Carnavale notes that many undergraduate students have moved on to newer, communications media — instant messaging, text messaging via cell phone, and web 2.0 sites like Facebook and MySpace. He notes that,

A 2005 report from the Pew Internet and American Life Project called “Teens and Technology” found that teenagers preferred new technology, like instant messaging or text messaging, for talking to friends and use e-mail to communicate with “old people.”

Newer, trendier — or perhaps just plain better — technologies have the attention of undergraduates and their juniors. Some schools have created a quasi-official presence in MySpace or Facebook and maintain it with much of the information that might have been exclusively posted by email a year or two ago.
RSS is not mentioned in this item; it’s a bit different a beast, admittedly. But the article got me thinking: just who is reading all my carefully constructed RSS feeds anyway? If RSS is a significant chunk of your library’s public relations and announcement effort, is it effective — particularly if the generation of people that seem natural users of it happen to see RSS as too unidirectional and “email-like.”
When I look at the server statistics on this blog, or on my library’s blog, I see lots and lots of hits from aggregators and search engines. And lots for Magpie, which I use with Feed2JS to reprint announcement headlines on my library’s home page. While some aggregators are kind enough to tell me that they’re acting on behalf of so many subscribers (sadly, that’s “so many” is far too often ‘1’ when it’s RSS4Lib, and I know that the aggregator is toiling away for me alone, a remnant of my exploring aggregators using my own feed), the hits-per-subscriber ratio assumes I publish more frequently — a LOT more frequently – -than I do.
Perhaps because it so darned easy to create an RSS feed out of almost any source — blogs and wikis, of course, but also content management systems, databases, you name it — and because it is so flexible, RSS is destined to fade into the background, just another piece of the infrastructure of the information age. Yet the promise of being able to skim and dip one’s intellectual toes into the information stream makes it more valuable than it seems. Ask not for whom RSS feeds, for it feeds for you…