2 Nov 2010 16:59
Re: mailto
Shawn Steele <Shawn.Steele <at> microsoft.com>
2010-11-02 15:59:34 GMT
2010-11-02 15:59:34 GMT
I used unicode <at> unicode instead of utf8 <at> utf8 because the encoding of a document, or the internal representation of the address, may not necessarily be utf8. (Eg: I don't want someone to stick utf8 bytes in a utf16 document :)) I think my concern is that for mailto: the distinction between an IRI and URI bucket is blurry to the average end user. So I think that some buckets that are nominally URI's will likely end up with mailto:unicode <at> unicode in them. Historically some applications have been generous in the strings they allow, so I expect non-ascii mailtos will probably continue to work. I don't know if that's worth acknowledging in 6068. It may be "worse" when people inadvertently cut & past from IRI to URI or URI to IRI. One approach apps could take is to presume UTF-8 (IRI) if > ASCII is encountered, &/or presume % encoding (URI) if that's encountered. Though John points out that testing for % isn't completely reliable in the local part. -Shawn http://blogs.msdn.com/shawnste ________________________________________ From: John C Klensin [klensin <at> jck.com] Sent: Tuesday, November 02, 2010 8:36 AM To: "Martin J. Dürst"; Shawn Steele Cc: jwz <at> jwz.org; ima <at> ietf.org; Larry Masinter \(masinter <at> adobe.com\) Subject: Re: [EAI] mailto --On Tuesday, November 02, 2010 14:38 +0900 "\"Martin J. Dürst\"" <duerst <at> it.aoyama.ac.jp> wrote: > On 2010/11/02 3:23, Shawn Steele wrote: > >> Particularly I think that users are likely to just use >> mailto:unicode <at> unicode without bothering with the % escaping, > > The "without bothering with the %-escaping" part is covered > where an IRI (instead of only an URI in the strict sense) is > accepted. Martin, Yes. But this is exactly where I find several of the parallel discussions troubling. (1) EAI thinks an email address is, in Shawn's notation, unicode <at> unicode (or possibly unicode <at> string-with-A-labels). It does not permit %-escapes in the domain name and many mail systems will interpret %-signs in the local part as something else entirely, e.g., routing information. (2) For an IRI, mailto:unicode <at> unicode is perfectly reasonable. However, the mapping to a URI, unless it is scheme/protocol dependent, is likely to produce either mailto:%-escapes <at> %-escapes or mailto:%-escapes <at> string-with-A-labels Both of which are really bad news if one tries to get from mailto:String to an email address by dropping the mailto and leaving String. I think that means that any sort of i18n MAILTO processor has to get from those forms back to the non-escaped Unicode-in-UTF-8 strings that EAI expects before passing an internationalized address off to a mail-sending or processing operation. If one reads the second paragraph of Section 5 of RFC 6068 (and the warning in the third-to-last paragraph of Section 7), one can claim that is implied there, but it is difficult (at best) to parse that exact meaning out from the convoluted language that was used, presumably to avoid a normative reference to the EAI specs. Note that this is a key difference from web and web-like applications, where the applications can be assumed to be able to deal with decoding the %-escapes themselves. My own guess is that, even if we discover that the specification part of 6068 does not need modification for EAI, we are likely to want to update the document with some very clear and specific text, and some lurid examples, about what is expected and what can go wrong. In a sense, there is a meta-level Security Consideration in this that is a bit stronger than the last few paragraphs of Section 7: any level of carelessness in implementations is likely to result in very bad behavior including possible misdirection or loss of mail. best, john _______________________________________________ IMA mailing list IMA <at> ietf.org https://www.ietf.org/mailman/listinfo/ima
RSS Feed