14 Apr 2010 18:32
Re: Unicode, CHAR-UPCASE/CHAR-DOWNCASE and char-upcase.1/char-upcase.2
Raymond Toy <toy.raymond <at> gmail.com>
2010-04-14 16:32:35 GMT
2010-04-14 16:32:35 GMT
On 4/4/10 8:04 AM, Erik Huelsmann wrote: > Hi Sam, > > On Sun, Apr 4, 2010 at 10:58 AM, Sam Steingold <sds <at> gnu.org> wrote: > >> On 4/3/10, Erik Huelsmann <ehuels <at> gmail.com> wrote: >> >>> However, in section 13.1.10, there seems to be an escape hatch: >>> "Documentation of implementation-defined scripts". A script is a >>> subtype of CHARACTER, nothing more nothing less. An >>> implementation-defined script gets to document the effect on >>> CHAR-UPCASE and CHAR-DOWNCASE. >>> >> I don't think this gives you a license to discard the round-tripping invariant. >> > I read the same section again and on second reading I think the > section indeed does not allow that freedom. > FWIW, CMUCL fails these tests because char-upcase does whatever Unicode says the uppercase character would be. > >>> there's no need to have the round-tripping requirement apply to most >>> of unicode - as can't be expected, see latin-small-letter-dotless-i >>> for an example. >>> >> why not make it its own upper case? >> this is not exactly correct from the unicode pov, but, I think, it is >> better that the alternative. >> this round-tripping requirement is, i think, pretty important in symbol i/o. >> > I hadn't thought about the reader and printer behaviours regarding > *readtable-case* and *print-case*. However, it would be logical by > analogy that if a string doesn't get recoded in a round-trip, then the > symbol name won't either. > This brings up another issue. CMUCL fails some symbol tests because cmucl converts the string to Unicode NFC form before creating the symbol. Ray
RSS Feed