14 Sep 2007 14:57
Re: Re: Problem with UTF-8
John Wilson <tug <at> wilson.co.uk>
2007-09-14 12:57:42 GMT
2007-09-14 12:57:42 GMT
On 14 Sep 2007, at 13:01, Ulrich Schaefer wrote: > John Wilson schrieb: >> Most software I know of which consumes XML over HTTP will ignore the >> charset. The problem is that it is almost always wrong (e.g. it is >> omitted but the encoding of the document is not US-ASCII). >> >> I'm pretty sure that Apache XML-RPC ignores the Content-type >> encoding. >> >> > I agree. > > (the following is from my longish experiments with XML-RPC > interoperability several months ago) > > Due to an "underspecification" in the initial XML-RPC specification, > there is no guarantee that different implementations really treat 8 > bit > encoded strings such as UTF-8, ISO-8859-X or EUC-JP correctly in both > directions. > The only thing that is defined for the string data type is exchange > of 7 > bit US-ASCII characters. Playing with headers may help, but not > necessarily, depending on the implementations used (I'm talking about > *existing* XML-RPC implementations; using the same Java XML-RPC > implementation at both ends e.g. is no problem). > My solution (without patching the XML-RPC libraries) for properly > connecting Python clients with a Java Server was to implement the > transfer via the binary data type (base64-encoded) with encoding and > decoding from/to Unicode at both ends. This is bad because of the > transcoding overhead, but formed the only solution for bidirectional > Unicode text exchange that worked correctly even for Japanese > characters. I can send Java & Python code examples if requested. I have implemented XML-RPC twice. Both times I did not emit an XML header but used numeric character references (i.e. things like ģ) for all code points > 127. This seems to work quite well as all implementations I have tested understand numeric character references (no doubt there are some which do not - I just haven't seen them). I would recommend this approach to all implementers. John Wilson Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/xml-rpc/ <*> Your email settings: Individual Email | Traditional <*> To change settings online go to: http://groups.yahoo.com/group/xml-rpc/join (Yahoo! ID required) <*> To change settings via email: mailto:xml-rpc-digest <at> yahoogroups.com mailto:xml-rpc-fullfeatured <at> yahoogroups.com <*> To unsubscribe from this group, send an email to: xml-rpc-unsubscribe <at> yahoogroups.com <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/
. I can send Java & Python code examples if requested.
I have implemented XML-RPC twice. Both times I did not emit an XML
header but used numeric character references (i.e. things like
ģ) for all code points > 127. This seems to work quite well as
all implementations I have tested understand numeric character
references (no doubt there are some which do not - I just haven't
seen them).
I would recommend this approach to all implementers.
John Wilson
Yahoo! Groups Links
<*> To visit your group on the web, go to:
RSS Feed