Jonathan Gorman | 27 Nov 21:56
Picon

Re: RSS and diacritics


Apologizes, In rereading I realized I mis-interpreted what you were saying.  I thought you had two distinct
problems (using html character entities) and issues with diacritics.

The answer as far as the entities?  RSS can be a mess ;).  RSS feeds are XML.  Sadly, a widespread practice has
occurred of using "escaped html" in fields of the RSS feeds.  There's no way to ensure that these escaping
nightmares will be parsed correctly.

HTML defines some character entities, but RSS doesn't have all of them.  You can attempt to add these
characters to the RSS feed via including them in a Doctype declaration at the beginning of the feed.  This
wikipedia page looks like it has some examples of that: http://en.wikipedia.org/wiki/XML.

The best solution?  Not really sure.  I'd lean towards not using "escaped html" in my RSS feed.  Instead use
just rss and the character references, which should display cleanly assuming that the rss feeder isn't junk.

(And by character reference, I mean use &#x..; where .. is the appropriate code point).

See http://en.wikipedia.org/wiki/Character_entity_reference for a bit more information.

Jon Gorman

---- Original message ----
>Date: Tue, 27 Nov 2007 14:56:56 -0500
>From: Bob Duncan <duncanr@...>  
>Subject: [Web4lib] RSS and diacritics  
>To: web4lib@...
>
>
>Greetings,
>
>I'm getting ready to offer RSS feeds for our library's recent 
>acquisitions lists and have run into a little snag:  characters with 
>diacritics.  I understand why I can't use HTML character entity 
>references and expect all feed readers to play nicely, so I tried 
>encoding the ampersand in the HTML entity reference (a suggested fix 
>that I can no longer document).  While this works great for some feed 
>readers, other readers and the two major browsers display the raw 
>code instead of the character with diacritical mark.
>
>Other than displaying plain letters without diacritics, is there a 
>way to code feeds so that all (or at least most) feed readers will 
>display the character with the mark?  (I'd like to be able to this in 
>item titles and descriptions.)
>
>Thanks,
>
>Bob Duncan
>
>
>~!~!~!~!~!~!~!~!~!~!~!~!~
>Robert E. Duncan
>Systems Librarian
>Editor of IT Communications
>Lafayette College
>Easton, PA  18042
>duncanr@...
>http://www.library.lafayette.edu/ 
>
>
>_______________________________________________
>Web4lib mailing list
>Web4lib@...
>http://lists.webjunction.org/web4lib/

Gmane