6 Apr 04:04
Re: Matching metrics (was: Registry in record-jar format)
John Cowan <jcowan <at> reutershealth.com>
2005-04-06 02:04:33 GMT
2005-04-06 02:04:33 GMT
Frank Ellermann scripsit: > I've added 0 for '*' = '*' and no match. Otherwise 8/4/2/1, > is this what you wanted ? Your metrics has apparently a > problem with en-Latn-US-scouse: I'm not clear on whether * = * should count as a match or a no-match. Originally I thought it should count as a match, but perhaps not. > If one side wants en-GB-scouse, and the other side offers > en-Latn-US-scouse (9) or en-Latn-GB (10), and it also has > en-Brai-GB-scouse (11), then en-Brai-GB-scouse "wins". All > in the 2nd column for en-GB-scouse. Fortunately en-US-scouse doesn't exist. > Not okay, but not completely unintentional, for some languages > I can guess what the text is about, as long as it's Latn: The > combined power of forgotten school Latin plus miserable French > sometimes helps with es or pt. But with ru I'd be lost - with > luck I can decode some Cyrl. For fy my chances are lousy, for > dk or nl it's better than zero. Fair enough. > One effect you see with both metrics; If one side wants > en-scouse, and the other side has only en-Latn-US-scouse and > en-Brai-GB-scouse, you get a draw. Apparently your algorithm > cannot completely replace the "default script" approach. Well, that problem applies at all levels: if you ask for en-AU, then no algorithm can choose between the offered en-GB and en-US (except RFC 2616, which will simply fail). For that matter, if you ask for de and nn and nb are all that's available, the matching algorithm won't help then either. -- -- Is a chair finely made tragic or comic? Is the John Cowan portrait of Mona Lisa good if I desire to see jcowan <at> reutershealth.com it? Is the bust of Sir Philip Crampton lyrical, www.ccil.org/~cowan epical or dramatic? If a man hacking in fury www.reutershealth.com at a block of wood make there an image of a cow, is that image a work of art? If not, why not? --Stephen Dedalus
RSS Feed