> But the last step should go a little bit forward than this: all
> character that "look the same" must be unified, for obvious reason.
> It would be a suicide, for instance, to allow Cyrillic letters like
> a, B, c, e, H, i, j, K, M, n, o, p, s, T, u, x, or y to be
> distinguished from the Latin letters by the same shape. People could
> use this to forge fraudulent web sites (e.g.,, where
> one or both the two "o"'s and the "e" are Cyrillic!)

This is a potential can of worms, because "look the same" is not a
Boolean property for glyphs. What about U+0076 LATIN SMALL LETTER V
and U+03BD GREEK SMALL LETTER NU, for example, or U+0070 LATIN SMALL
LETTER P and U+03C1 GREEK SMALL LETTER RHO? These pairs do not look
100% identical, but would probably still confuse a user who does not
expect a URL to contain characters from mixed scripts.

The point is that with 50,000 possible characters, there is no place
you can safely draw this line.

The same could be said for the fuzzy second-step category "all
characters that are not essential."

-Doug Ewell
 Fullerton, California

