10 Jul 08:36
Re: [OpenID] Canonical OpenID url form
From: Peter Williams <pwilliams <at> rapattoni.com>
Subject: Re: [OpenID] Canonical OpenID url form
Newsgroups: gmane.comp.web.openid.general
Date: 2008-07-10 06:40:12 GMT
Subject: Re: [OpenID] Canonical OpenID url form
Newsgroups: gmane.comp.web.openid.general
Date: 2008-07-10 06:40:12 GMT
So the short form of the story is: use xri for unicode (and then transform the xri into an https hxri). Its been a month since I studied xri (and thus have forgotten 80 percent of it). I recall there was a syntax to identify the address of the initial resolver. Is there a way tha this became the domain name componnt of the hxri -----Original Message----- From: Drummond Reed <drummond.reed <at> cordance.net> Sent: Wednesday, July 09, 2008 11:34 PM To: 'Johnny Bufu' <johnny.bufu <at> gmail.com>; 'Andrew Arnott' <andrewarnott <at> gmail.com> Cc: 'OpenID List' <general <at> openid.net> Subject: Re: [OpenID] Canonical OpenID url form Also for the record, XRIs (which use the IRI character set) have a very simple defined transformation into IRIs. Thus when an XRI needs to be sent over-the-wire in an HTTP(S) URI, it must first be transformed into an IRI, then you follow the IRI spec (RFC 3987) to transform into a URI as Johnny describes below. Reverse the process to display back to the user. See http://docs.oasis-open.org/xri/xri-syntax/2.0/specs/cs01/xri-syntax-V2.0-cs. html for all the gory details (and they are gory - Unicode is hard). =Drummond > -----Original Message----- > From: general-bounces <at> openid.net [mailto:general-bounces <at> openid.net] On > Behalf Of Johnny Bufu > Sent: Wednesday, July 09, 2008 10:52 PM > To: Andrew Arnott > Cc: OpenID List > Subject: Re: [OpenID] Canonical OpenID url form > > For the record, since this continued in an offline thread: > > The issue is around the User-Supplied Identifiers. OpenID defines them > as a type of Identifiers, which in turn defined as HTTP(S) URI or XRIs. > HTTP(S) URI do not allow non-ASCII characters. > > So, out of scope of OpenID, parties accepting IRIs (other than XRIs) > should follow the respective authoritative recommendations (i.e. > RFC3987) before presenting such strings to the OpenID layer as HTTP > URIs, and convert them back to IRI form later on when they need to be > displayed back to the users. > > Johnny > > On 08/07/08 10:32 PM, Andrew Arnott wrote: > > Thanks, Johnny. I've had some conversations with a few other people > > who draw the opposite conclusion and believe that the %AB%CD notation > > is the canonical form. > > > > You make a good point about having to unescape the characters from > > the URI just above the transport layer, but I believe you're applying > > section 4.1 to the URL when it should only be applied to the > > key/value pairs. The OpenID ClaimedIdentifier, which by the spec is > > the last URL to respond without an HTTP redirect, cannot be in > > unicode by the URI specification because unicode characters are not > > allowed, whether that is UTF8 or UTF16. > > > > Name/value pairs passed as part of a querystring may (and as the > > section you quote requires) be encoded as UTF-8, but they are > > subsequently URI encoded as %AB%CD hex characters (thus doubly > > encoded) so they are actually no longer UTF-8 at the transport layer. > > Since the OpenID URL, around which all the identity of OpenID is > > focused (omiting XRIs which don't suffer from this problem) /is/ at > > the transport layer of the way the security requirements force the > > claimed identifier to be discovered, is all about the transport > > layer, I believe it would be a mistake to add semantics on top of > > that and call it canonical. > > > > What I also realized from some other conversations is that this > > doesn't really matter. As long as an OP or RP is consistent within > > itself in storing and comparing Claimed Identifiers, whether it > > stores and compares %AB%CD or the unicode equivalent character won't > > matter to anyone, since on the protocol/wire level it is always > > %AB%CD. However, I think unescaping the URL and getting the original > > unicode characters back is very useful and should be done for > > purposes of displaying to the user. > > > > I think for the security and guaranteed identity of the protocol, > > there is a meaningful side to this though. It has not got to do with > > how the claimed identifier is stored, but rather how a unicode > > string is escaped for URI transport. A given unicode string may be > > represented by more than just one series of bytes. Unicode > > characters exist that in UTF-8 or UTF-16 have multiple byte sequences > > /for the same character/. Therefore someone who is typing in their > > OpenID url to a site using one method during one visit, and then > > types it in to the same site using a different method on a subsequent > > visit, will only be identified by the RP as the same visitor if > > OpenID requires that the RP transforms whatever unicode string is > > given by the user to the canonical byte form as defined by the > > unicode standard before transit. For example, the letter 'Á' can be > > encoded as a single character or using composition by adding an > > accent to the A character. Both are legal, but the unicode standard > > defines one as canonical (I think). But if a string containing this > > character is not canonicalized first, then although the character is > > equivalent to the user and to unicode, the encoded %AB%CD string will > > be different, resulting in security problems for OpenID because > > people could overload a single Identifier just by using different > > encodings at an OP, or fail to log into an RP depending on how they > > craft their string. By the way, I say 'unicode' in the strict sense, > > applying to UTF-8, UTF-16, etc. Unicode is commonly used to refer to > > just UTF-16, but this problem applies to all unicode character sizes. > > > > > > > > > > So I think OpenID should be more explicit about its unicode support > > for Identifiers, including mandating a canonical Unicode form. > > > > On Tue, Jul 8, 2008 at 9:41 PM, Johnny Bufu <johnny.bufu <at> gmail.com > > <mailto:johnny.bufu <at> gmail.com>> wrote: > > > > > > On 08/07/08 03:01 PM, Andrew Arnott wrote: > > > > What is the canonical form of an OpenID URL? One with the %AB%CD hex > > encoding for unicode chars in the URL or with the actual unicode > > chars? For the purposes of displaying to the user and storing in the > > RP's database. > > > > The spec doesn't seem to have anything to say on this. > > > > > > I believe it does say: > > > > 4.1. Protocol Messages The OpenID Authentication protocol messages > > are mappings of plain-text keys to plain-text values. The keys and > > values permit the full Unicode character set (UCS). When the keys and > > values need to be converted to/from bytes, they MUST be encoded > > using UTF-8 [RFC3629]. > > > > http://openid.net/specs/openid-authentication-2_0.html#anchor4 > > > > > > The reason I think it's not a simple automatic answer is the unicode > > chars may be what the user typed in and what exists on the server, > > but in transit, these characters are translated to %AB%CD in order to > > be validly escaped URI strings. > > > > > > The receiving party must decode them to the original form when they > > are extracted from the transport layer. > > > > > > So one could argue that the unicode characters are never part of the > > protocol > > > > > > One would then be ignoring the parts of the protocol that do not deal > > with the transport layer directly. > > > > > > Johnny > > > > > > !DSPAM:139,48744d86221113907413095! > _______________________________________________ > general mailing list > general <at> openid.net > http://openid.net/mailman/listinfo/general _______________________________________________ general mailing list general <at> openid.net http://openid.net/mailman/listinfo/general
RSS Feed