4 Jan 2005 13:45
Re: UTF8
Niels Möller <nisse <at> lysator.liu.se>
2005-01-04 12:45:01 GMT
2005-01-04 12:45:01 GMT
der Mouse <mouse <at> Rodents.Montreal.QC.CA> writes: > I'm faced with an encoding-agnostic > filesystem interface and implementation, wherein filename components > are sequences of octets not including 0x00 and 0x2f, independent of any > characters; Please leave the file system issues out of it for now. What's of primary importantance are the core drafts, and those deal with usernames and passwords in utf8 form, *not* file names. The issues for filenames, e.g. in sftp, are slightly different, and not relevant to the core drafts. > I'm faced with password hashing routines that work with > octet strings, not character strings; etc. > Am I required to reject attempted non-ASCII > strings in these places for no reason other than an inability to know > what the user intended the character set - if any - to be? (For that > matter, what grounds are there for assuming that octets in the ASCII > range are intended to correspond to ASCII characters, rather than, say, > KOI-7?) I'm assuming you're talking about the server implementation now (client side is comparatively trivial; convert input to utf8 based on the current $LC_CTYPE). On the server side, problem is that at login time, you don't know the user's $LC_CTYPE. My recommendation is as follows: 1. Chose one default encoding (be that plain ascii, or latin1, or koi-7, or normalized utf-8, depending on your context and preference). 2. Provide an option for the sysadmin to say that on his or her particular system, some other character set is used for user names and passwords. Then convert the usernames and passwords you get on the wire to the selected encoding. That's almost solves the problem, and it's no big deal. Optionally, to support systems where different users use different character sets for their usernames and/or passwords, use some per user configuration or kludgery to figure out the user's character set. I'll be happy to discuss these implementation issues (my implementation doesn't get non-ascii quite right yet either), but we should probably do that off-list. > Given how common such systems are, it seems a bit odd that the IETF > would take a position so apparently incompatible with them. Do you have some numbers to back that up? I've seen quite some number of unix systems, but as far as I can recall, I've *never* seen one where usernames and passwords used non-ascii characters. (I *have* seen plenty of non-ascii filenames, but as I said, that's a different issue, and irrelevant to the core drafts). I live in latin1-land, not asia, though. Best regards, /Niels
RSS Feed