27 Nov 23:58
Re: RSS and diacritics
At 03:56 PM 11/27/2007, Jonathan Gorman wrote: >Apologizes, In rereading I realized I mis-interpreted what you were >saying. I thought you had two distinct problems (using html >character entities) and issues with diacritics. Phew! I thought I was going to have to attempt a reply to your first response. ;o) >The answer as far as the entities? RSS can be a mess ;). RSS feeds >are XML. Sadly, a widespread practice has occurred of using >"escaped html" in fields of the RSS feeds. There's no way to ensure >that these escaping nightmares will be parsed correctly. > >HTML defines some character entities, but RSS doesn't have all of >them. You can attempt to add these characters to the RSS feed via >including them in a Doctype declaration at the beginning of the >feed. This wikipedia page looks like it has some examples of that: >http://en.wikipedia.org/wiki/XML. > >The best solution? Not really sure. I'd lean towards not using >"escaped html" in my RSS feed. Instead use just rss and the >character references, which should display cleanly assuming that the >rss feeder isn't junk. > >(And by character reference, I mean use &#x..; where .. is the >appropriate code point). Thanks. I think that will do it. I was using name-based references (Egrave, etc.) and escaping the ampersand, which worked in most feed readers but not in everything capable of displaying a feed. The numeric character references work fine in all apps tested so far. One other question: which numeric reference is preferable? For example, both É and É (xC9 and 201) produce a Latin capital E acute. Are there good reasons to use one over the other? (And is either more likely than the other to be correctly rendered by browsers in non-RSS situations?) Thanks, Bob Duncan ~!~!~!~!~!~!~!~!~!~!~!~!~ Robert E. Duncan Systems Librarian Editor of IT Communications Lafayette College Easton, PA 18042 duncanr@... http://www.library.lafayette.edu/
RSS Feed