4 May 17:50
Re: More on entities and Â
On Tue, May 04, 2004 at 05:02:43PM +0200, Michele Beltrame wrote: > It really depends on Perl, as it has a "use UTF8 if I find a UTF8 charachter" > behaviour. The only way you can be sure output is *always* UTF8 or > *always* ISO8859-1 is to use the Encode module, as per example I > posted in my previous message. OK, this explanation makes sense. > > The next thing that confuses me is that I have Perl 5.8.3 installed on > > both systems. Only one is showing the extra character. > > This is, of course, mistery.Figures... :-/ > The 00 is not actually prepended to charachters with code point 0-127 > in UTF8. This is one of the things that make UTF8 different from > UCS2 (also known as UTF16), which always used two bytes for a > charachters. UTF8 chars are of variable byte-occupation, and that > allows charachter 0-127 to remains the same, thus maintaining > perfect compatibility with US ASCII documents. Thanks for the lesson. Can you explain what is happening that makes the A0 character have a C2 appended to it when output as utf-8? My understanding of utf-8 was that it was compatible with latin1. This behavior is *not* very compatible from my point of view. One more point which may be at the root of my problems. I'm trying to get Apache to add the Content-Type header using the following declaration in my httpd.conf per the Apache docs: AddDefaultCharset utf-8 No matter if I have this in my main server configuration or the virtual host configuration, if I do a `HEAD http::servername`, I get back a Content-Type of iso-8859-1. If I view the page in Firefox and manually tell Firefox to display it as UTF-8, all is well. Any ideas why Apache isn't playing nice? Thanks, William -- -- Knowmad Services Inc. http://www.knowmad.com
Figures... :-/
> The 00 is not actually prepended to charachters with code point 0-127
> in UTF8. This is one of the things that make UTF8 different from
> UCS2 (also known as UTF16), which always used two bytes for a
> charachters. UTF8 chars are of variable byte-occupation, and that
> allows charachter 0-127 to remains the same, thus maintaining
> perfect compatibility with US ASCII documents.
Thanks for the lesson. Can you explain what is happening that makes the
A0 character have a C2 appended to it when output as utf-8? My
understanding of utf-8 was that it was compatible with latin1. This
behavior is *not* very compatible from my point of view.
One more point which may be at the root of my problems. I'm trying to
get Apache to add the Content-Type header using the following
declaration in my httpd.conf per the Apache docs:
AddDefaultCharset utf-8
No matter if I have this in my main server configuration or the virtual
host configuration, if I do a `HEAD http::servername`, I get back a
Content-Type of iso-8859-1. If I view the page in Firefox and manually
tell Firefox to display it as UTF-8, all is well. Any ideas why Apache
isn't playing nice?
Thanks,
William
RSS Feed