From: Hans Aberg (email@example.com)
Date: Mon Feb 21 2005 - 12:56:51 CST
At 13:17 -0800 2005/02/19, Doug Ewell wrote:
>> If one does it that way, one quickly gets into trouble. But one
>> defines a map, which merges some characters for separating IDN's,
>> while retaining the original Unicode character set on the user level
>> on the input. Take a character set C, which might be a subset of
>> Unicode, and send the Unicode characters (or a suitable subset
>> thereof) into the set of finite sequence of C. Two IDN's will be
>> declared equal if mapped to the same character sequence. This map is
>> only used define which IDN's are viewed as equal. But one is still
>> free to use whatever Unicode sequences one wants. The map is used to
>> define an equivalence relation on the set of Unicode character
>> sequences, but does not in itself affect which Unicode sequences which
>> are admissible.
>That isn't the point. It doesn't matter if the mapping takes place at
>the character-encoding level or at some other level. The problem of
>determining which pairs of characters are confusable and should be
>folded together, and which pairs aren't and shouldn't, still remains to
If one does the way, no characters or character sequences are actually
folded together, but one defines a semantics which tells which character
sequences should have the same semantics.
>The point is that there is a large set of "semi-confusable" characters,
>for which it simply cannot be said conclusively that they "look alike"
>or "don't look alike." It depends on font, size, medium (paper vs.
>screen), and sometimes context. This is one of those problems for which
>a partial solution simply isn't good enough.
I my own opinion one should be pretty conservative (making the equivalence
classes as small as possible). But in the case of the IDN's, one probably
can be quite radical: All that happens when people try to define new names
is that may already be occupied.
>And for something like IDNs, once you have decided on a mapping, you can
>never, ever change it. Otherwise you will have a domain name available
>for registration by customer A today, but a similar one not available to
>customer B six months later (or vice versa, A can't get it but B can).
>Either way, you have a lawsuit.
Sure you can change it: One can make the equivalence classes smaller,
whenever one wants. For example, upper and lower case letters are now
considered equivalent. One takes that equivalence away, names must be used
as registered with respect to casing. If one introduces new equivalence
classes, then one must run through all registered names, and require that
names that are merged onto each other are changed. But if one is
sufficiently conservative when defining these equivalences, that should not
happen too often. For example, if now Latin "o" and Greek "omicron" are
viewed inequivalent but are declared equivalent, then, if more than one user
has used the name "foo" in various combinations of the letter, then all but
one must change registration name. But if one uses a Latin "foo", and
another uses the Greek name with "f" replaced by phi, then they do not have
to enter new registrations. And one can still mix Greek and Latin scripts as
one pleases; it merely blocks out other users to enter a confusable name.
This archive was generated by hypermail 2.1.5 : Mon Feb 21 2005 - 14:41:20 CST