Waiter, There’s a Diacritic in My Feed

How to best encode characters with diacritics in your RSS feed? That was the question posed in a thread on Web4Lib recently, started by a librarian working putting his new books list into an RSS feed. (A great idea in itself, of course.) Since many books, especially in an academic library, but also in other libraries with user communities speaking diverse languages, this is important — you want to be able to show the title properly, especially to speakers of the language for whom you’ve bought the book.
There are, of course, several ways to encode many diacritic marks (HTML character entities and Unicode, for example); finding the best one for RSS engendered some discussion.
The consensus in the discussion was that using character references is the best solution, particularly for the item title, which is arguably the most important part of an RSS item to get right (OK, the URL is also critical). If users cannot understand the title, why would they click to the full text?
Character references take the form

Ӓ

The numbers refer to the actual character; “402” is a ƒ or “florin”; “247” is a ÷ or division symbol. And so on… For a list of common character references, see this entities table.