Shawn Steele | 2 Nov 2010 16:59
Picon
Favicon

Re: mailto

I used unicode <at> unicode instead of utf8 <at> utf8 because the encoding of a document, or the internal
representation of the address, may not necessarily be utf8.  (Eg: I don't want someone to stick utf8 bytes
in a utf16 document :))

I think my concern is that for mailto: the distinction between an IRI and URI bucket is blurry to the average
end user.  So I think that some buckets that are nominally URI's will likely end up with
mailto:unicode <at> unicode in them.  Historically some applications have been generous in the strings they
allow, so I expect non-ascii mailtos will probably continue to work.  I don't know if that's worth
acknowledging in 6068.  It may be "worse" when people inadvertently cut & past from IRI to URI or URI to IRI.

One approach apps could take is to presume UTF-8 (IRI) if > ASCII is encountered, &/or presume % encoding
(URI) if that's encountered.  Though John points out that testing for % isn't completely reliable in the
local part.

-Shawn

 
http://blogs.msdn.com/shawnste



________________________________________
From: John C Klensin [klensin <at> jck.com]
Sent: Tuesday, November 02, 2010 8:36 AM
To: "Martin J. Dürst"; Shawn Steele
Cc: jwz <at> jwz.org; ima <at> ietf.org; Larry Masinter \(masinter <at> adobe.com\)
Subject: Re: [EAI] mailto

--On Tuesday, November 02, 2010 14:38 +0900 "\"Martin J.
Dürst\"" <duerst <at> it.aoyama.ac.jp> wrote:

> On 2010/11/02 3:23, Shawn Steele wrote:
>
>> Particularly I think that users are likely to just use
>> mailto:unicode <at> unicode without bothering with the % escaping,
>
> The "without bothering with the %-escaping" part is covered
> where an IRI (instead of only an URI in the strict sense) is
> accepted.

Martin,

Yes.  But this is exactly where I find several of the parallel
discussions troubling.

(1) EAI thinks an email address is, in Shawn's notation,
unicode <at> unicode (or possibly unicode <at> string-with-A-labels).  It
does not permit %-escapes in the domain name and many mail
systems will interpret %-signs in the local part as something
else entirely, e.g., routing information.

(2) For an IRI, mailto:unicode <at> unicode is perfectly reasonable.
However, the mapping to a URI, unless it is scheme/protocol
dependent, is likely to produce either
   mailto:%-escapes <at> %-escapes
or
   mailto:%-escapes <at> string-with-A-labels

Both of which are really bad news if one tries to get from
mailto:String to an email address by dropping the mailto and
leaving String.

I think that means that any sort of i18n MAILTO processor has to
get from those forms back to the non-escaped Unicode-in-UTF-8
strings that EAI expects before passing an internationalized
address off to a mail-sending or processing operation.   If one
reads the second paragraph of Section 5 of RFC 6068 (and the
warning in the third-to-last paragraph of Section 7), one can
claim that is implied there, but it is difficult (at best) to
parse that exact meaning out from the convoluted language that
was used, presumably to avoid a normative reference to the EAI
specs.

Note that this is a key difference from web and web-like
applications, where the applications can be assumed to be able
to deal with decoding the %-escapes themselves.

My own guess is that, even if we discover that the specification
part of 6068 does not need modification for EAI, we are likely
to want to update the document with some very clear and specific
text, and some lurid examples, about what is expected and what
can go wrong.  In a sense, there is a meta-level Security
Consideration in this that is a bit stronger than the last few
paragraphs of Section 7: any level of carelessness in
implementations is likely to result in very bad behavior
including possible misdirection or loss of mail.

best,
   john
_______________________________________________
IMA mailing list
IMA <at> ietf.org
https://www.ietf.org/mailman/listinfo/ima

Gmane