MyFeedz

MyFeedz is an interesting tool from Adobe Lab’s — from Adobe’s Romanian office, to be precise — that aspires to build a personalized news service based on your RSS reading habits.

MyFeedz has two ways to learn about your interests. One is by simply watching you read the titles and first paragraphs of text and noting when you click through to read the full text. The other involves jump-starting the system (at your choice) by uploading your RSS reading list via an OPML file (OPML files can be output from most popular RSS aggregators, including Bloglines and Google Reader). The process behind the scenes is not quick — I suggest getting a cup of coffee, walking the dog, or reading your day’s RSS feeds while MyFeedz works.

When MyFeedz is done doing its magic — described somewhat cryptically on the site as a process involving an analysis of “its source, tags, popularity, rating, language and more” — it has built a profile of the subjects that you generally prefer to read. It then goes out and looks for more blog entries that match this general profile.

Here is where I’ll speculate a bit on what is going on behind the scenes. My guess is that MyFeedz is using some form of vector analysis to model the items in which you have shown an interest. Vector analysis (see this Wikipedia article for more of the math than I understand) compares texts and scores them with a probability that they are about the same thing, even if the same exact phrases do not appear in both documents.

If I am correct, that is the time-consuming part. For my aggregated list, MyFeedz took several hours to be ready to show me new articles based on the profile it generated for me. But now, it is showing me articles within the sphere of what I’ve already expressed interest in knowing about. As I write this, three of its top five recommendations for me are:

The other two are less clearly germane (one about the Dodgers and I’m a member of Red Sox Nation, the other about NJ Governor Corzine’s political difficulties). I think batting .600 is pretty good for an automated recommendation engine that I just recently started using. However, as I’ve written before (see “Serendipity at Risk“), I like a bit of surprise in my reading, as long as it is tangentially related to what I’m focusing on. The items I catch out of the corner of my eye while I’m looking in one direction are often fascinating.

I will be curious to see if, as I work with MyFeedz, if it continues to narrow in on my core interests while providing me some “ah-ha!” moments. MyFeedz will not replace my aggregator — that is not its purpose — but it makes for an interesting discovery tool.

LibraryThing and the Danbury Library

This really has little to do with RSS, but it is such a useful and clever service that I can’t resist writing about it.

Tim Spalding of Library Thing today announced LibraryThing for Libraries with its first implementation, the Danbury Library (in Connecticut).
Tim explains the whys and wherefores in great detail in his post, but the upshot of it all is that when you search for a book in the Danbury library’s catalog, in addition to the catalog and holdings data, you also see:

  • Tags from LibraryThing’s 200,000 members and 13 million books;
  • Other editions and translations of the book you are looking at;
  • Tags entered by LibraryThing’s users describing the item you are viewing; and
  • Similar titles.

The last three items only show books held by Danbury’s library. And LibraryThing has restricted the tags that appear in the Danbury catalog so that tags that describe location of the book or the tagger’s intent (for example, “at the beach house” or “to read”) are not included.

RSS New Books Screen Saver

Here’s a very clever use of the Macintosh’s RSS screen saver and the library’s new book list: a new books RSS screen saver. If you have a Mac, follow this link: screen saver (there are instructions on the blog for customizing it). Windows users are out of luck, I’m afraid.

What a great idea — computers in the library could advertise the new materials. With a bit of effort, computers in the children’s area could show new children’s books, new mysteries in that section, and new biographies on the PC near that section. And, of course, patrons could use it at home, too.

Public Schools and RSS

The Colonel Mitchell Paige Middle School in La Quinta, California, is using RSS and podcasts to keep parents in touch with the day’s activities. There’s a podcast of the morning announcements. Some teachers are recording information about tests and how-to tips for students and parents. And other teachers are using RSS to let parents know about their child’s homework assignments.
I wonder how many public school libraries could help — or already are helping — their school by providing this sort of infrastructure?

Counting RSS Subscribers

UPDATE (8 June 2008)
Find out how many subscribers your RSS feed has using YourStats, an RSS4Lib tool. Upload your own blog’s server access log files and get a count of how many readers your feed has.

How many people read RSS4Lib? I’ve asked this question before, but I keep coming back to it. Each time I do, I realize that the answer is even less straightforward than I previously thought.

Looking at my server log files, I think it’s clear that the vast majority of hits come from the RSS and ATOM feeds — they account for a whopping 43,255 requests, or 68.9% of files delivered from www.rss4lib.com, for April 2007. This number — as impressive as it sounds to me — does not really mean much. Feed readers and aggregators, by their very nature, check the feed frequently (many times a day) to see if the feed has been updated so that the application can tell the user there’s something new to read. A well-behaved feed reader or aggregator will only download the feed again if it has changed. But it is still pinging the server regularly to see if anything has changed before it downloads the full feed. A very large portion — well over 50% — of these requests result in nothing being downloaded from the server.

Where Are Feeds Read?

So how can one tell how many people are actually subscribed to an RSS feed? There are at least four significant sources of subscribers to an RSS feed:

  1. Web-based aggregators like Bloglines and Google Reader can download the same feed many times a day to make sure nothing is new and then reproduce it for multiple subscribers.
  2. PC-based feed aggregators (for example, Feed Demon or Radio UserLand). Like Web-based aggregators, these applications also check the feed periodically, but do so for each user individually. Ten users running these applications would result in ten downloads of an RSS feed per time period (every hour, every day, twice a day — depending on how the user has configured his own application).
  3. Browser-based “live bookmarks” (for example, Firefox, Internet Explorer 7, and Safari). Newer web browsers allow a user to bookmark an RSS feed and display, variously, headlines or the full RSS feed as items are updated. Like other aggregators, they check the RSS file periodically for updates.
  4. Web applications (such as Feed2JS or RSS2HTML) that plug RSS feeds into web pages that are, in turn, read by one or more people.

Methodology

So as I looked through my server log files, I began to get increasingly curious about what, if anything, I could tease out of the data that are there. There are, of course, commercial products, like FeedBurner or GetClicky, that provide nice reports, if you have them provide your feeds. I’ve opted to do things myself, though. So I got adventurous and started writing some Perl code to parse the log files and make a best-guess effort to estimate the number of subscribers to RSS4Lib’s feeds.

Subscribers will be off for several reasons, the most significant one being that a single person might be subscribed to the same feed in several places. I’m not overstating my self-importance; how many of us ever delete our subscriptions at one service when we move to a new one? Not I. When I try a new aggregator, I’m likely to download my entire subscription from my current favorite aggregator as an OPML file and upload it into the new aggregator. Whichever ends up being my current favorite I use; the has-been just sits there, but I’m still subscribed to things. And unless I’ve made a live bookmark in Firefox in my toolbar, I may not notice that a feed I subscribed to has been merrily updating itself for weeks.

On the other hand, in many cases it’s very difficult to determine if a given user-agent is a web browser or PC-based aggregator (with, presumably, one subscriber) or an aggregator or web application with many subscribers. Most user-agents — how a web browser or application identifies itself to the web server — simply give their name, their platform (Mac, Windows, etc.), and what kind of browser they’re most like (Mozilla, Gecko, etc.). A very few — and fortunately for my purposes, the most popular aggregators are included in this elite few — actually include the number of readers subscribed to the feed in the user-agent statement (for example, Bloglines tells the server that it’s “Bloglines/3.1 (http://www.bloglines.com; 602 subscribers)”. It gives a name, a URL for more information, and the number of subscribers. Other user-agents are less informative; an example of this type is “FeedOnFeeds/0.1.9 (+http://minutillo.com/steve/feedonfeeds/)”. And still others are downright terse: “Particls/1.0”.

How to count these various types? It depends. The good guys (Bloglines, Google, Yahoo, and a few others) make life easy; it’s simply a matter of looking at the log file and pulling out the number of subscribers. There’s not much to be done with the “bad” web aggregators, the ones that do not provide any user subscriber data, except to count them as one aggregator, one user.

For PC- and browser-based readers, it’s possible to make some good guesses. The log file includes the IP address of the computer requesting the RSS file. By combining the user-agent and the IP address and counting the unique pairs, it’s possible to come up with a good guess at the number of unique users who are receiving the RSS file through this channel. Since each user-agent/IP address pair likely downloads the file multiple times in a day, the number of user-agent/IP address pairs stands as a proxy for subscribers.

And then there are web-based applications like Feed2JS that simply convert an RSS feed into HTML and place it on a web page. The user-agent does not know how many people read the web page and does not provide the URL of the page where the feed will appear. (I don’t think this latter point is even feasible; thanks to caching of RSS feeds on the web server where the content is being reused, it’s impossible to know at the time the feed is downloaded from here where it will end up.) This sort of activity gets boiled down to the number of unique user-agent/IP address pairs, which is almost certainly lower than the number of people who see the web page the feed is on.

Results

When all of these things are processed and added together, I discovered that there might have been 1096 subscribers on April 30, 2007. “Might have been?” Yes — this number is a guess and is almost certainly not the real number of subscribers, let alone the real number of readers. Readers could be lower (people who subscribe in more than one place but read in one; readers who simply have not bothered to delete the subscription in a no-longer-favored aggregator) or higher (people who read the feed through web pages or through syndicators that use aggregators as the source of their information and then redistribute it elsewhere — via tools such as ZapTXT and Feed2JS). But this number will have to do for now.

One thing is clear to me from this exploration of my server’s log files: RSS allows my content to go places I never thought possible and to be read by an audience far broader than I would have guessed reasonable. That’s something to keep in mind when you’re writing for your library’s site — or for yourself.

Please feel free to experiment with the application I wrote — it’s at http://www.rss4lib.com/feedstats/ — and let me know what you think.

EBSCOhost Update

I received an email today from Kathleen McEvoy at EBSCO Publishing in response to my post on Friday about EBSCOhost‘s new RSS features. She explained that the way I thought the RSS feed should work — a view echoed by David Rothman — is, in fact, the way the site works, contrary to the instructions that were posted there. EBSCO updated the instructions to say more clearly:

As long the EBSCOhost user adds the feed to an aggregrator within one week of its creation, it will not expire, unless the aggregrator does not automatically update results (extremely unlikely) supplied by the feed for two months.

As long as the aggregator pulls down the feed regularly, the service stays alive, as it should.

EBSCOhost Adds RSS Features

See the next post for an update to this item. (30 April 2007).
EBSCOhost has
added RSS feeds for any search you execute within its databases. Once you’ve activated this new feature on the “New Features” page, linked from the upper right corner of EBSCOhost pages (it’s called “One Step Alerts” — the press release omits the name), any search you run in the database can be turned in to an RSS feed for updates. Simply click the “Create alert for this search” link and receive a link for the corresponding RSS feed.
Items in the feed are article titles and brief citations. The “Read More” link takes you to a full citation page with an OpenURL link for your library (assuming your library has one set up). If you’re logged in, you can receive the alerts by email as well as RSS; no login is required for the RSS feed, though. Your feeds last indefinitely as long as you access the feeds within one week of creation and no more than two months goes by without new data in the feed.
My only quibble is with the two-month inactivity limit on the feed itself. Search alerts should be a “fire and forget” service — they run until cancelled. Perhaps a better expiration date would be based on another kind of inactivity — for example, the feed is not accessed (by an aggregator or feed reader) or the user does not click through to the full citation for the full citation in some extended period of time. After all, search alerts do not necessarily serve a short-term role — for me, they are very useful tools when I want to stay on top of a topic over the long-term. I’m more likely to create a search alert on a topic where “new stuff” is irregular or unpredictable than when information comes so quickly that I remember to look myself.
Overall, though, this is an excellent service and a model for other vendors to emulate.

Library of Congress Blog

The granddaddy of American libraries has become one with its multitudinous siblings descendants: the Library of Congress now has its very own weblog. The blog’s author, Matt Raymond, writes that his blog’s mission will “be in keeping with the spirit of the Library’€™s mission as a whole: ‘to make its resources available and useful to the Congress and the American people and to sustain and preserve a universal collection of knowledge and creativity for future generations.'”

I noted with interest that the blog’s author, Matt Raymond, is the library’s Director of Communications and a journalist by education. Although I’d find professional and personal interest in a librarian blogger’s perspective on the library, I’m impressed that the Library of Congress has decided to use blogs and RSS as a communications and marketing tool. Welcome to the biblioblogosphere!

Google AJAX Feed API

Google announced a new API this week, the Google AJAX Feed API. In a nutshell, it allows you to use Google’s cache of RSS feeds (the same cache that makes Google Reader work) for whatever purpose you want: recent posts from your favorite blog (à la Feed2JS), mashups, or anything else.
Results are returned in JSON, XML (the original feed source), or both. It supports all flavors of RSS and Atom.
I’ve just started playing with this — but here’s a completely unstyled list of headlines from CNN pulled down and displayed using this API: RSS4Lib Google API Test.

Legislative Feeds Directory

Thanks to contributions from RSS4Lib readers and a bit of online searching, I’ve put together the start of a directory of legislatures (national bodies and, for the United States, state legislatures) that offer RSS feeds to track current legislation in one way or another. See Legislation Feeds for the list to date.

If you know of other legislative bodies that offer their constituents an RSS tool for tracking legislation, please send them to legislationfeeds@rss4lib.com.