4 May 17:02
Re: More on entities and Â
Hi! > Grant seems to be saying the default is UTF8 whereas Michele says it is > iso8859-1. It really depends on Perl, as it has a "use UTF8 if I find a UTF8 charachter" behaviour. The only way you can be sure output is *always* UTF8 or *always* ISO8859-1 is to use the Encode module, as per example I posted in my previous message. > The next thing that confuses me is that I have Perl 5.8.3 installed on > both systems. Only one is showing the extra character. This is, of course, mistery.> Finally, my reading of utf8 docs says that a 00 should be appended to > ANSI characters. Where is the A0 character coming from? The 00 is not actually prepended to charachters with code point 0-127 in UTF8. This is one of the things that make UTF8 different from UCS2 (also known as UTF16), which always used two bytes for a charachters. UTF8 chars are of variable byte-occupation, and that allows charachter 0-127 to remains the same, thus maintaining perfect compatibility with US ASCII documents. Michele. -- -- Michele Beltrame http://www.italpro.net/mb/ ICQ# 76660101 - e-mail: mb@...
> Finally, my reading of utf8 docs says that a 00 should be appended to
> ANSI characters. Where is the A0 character coming from?
The 00 is not actually prepended to charachters with code point 0-127
in UTF8. This is one of the things that make UTF8 different from
UCS2 (also known as UTF16), which always used two bytes for a
charachters. UTF8 chars are of variable byte-occupation, and that
allows charachter 0-127 to remains the same, thus maintaining
perfect compatibility with US ASCII documents.
Michele.
RSS Feed