RE: Case mapping of dotless lowercase letters

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Tue Dec 16 2003 - 09:13:08 EST

  • Next message: Elaine Keown: "Re: Stability of WG2"

    > Do we have Unicode DNS yet?

    Yup. You can put Chinese letters in domain names now. You do it like this:
    (1) Convert to NFC
    (2) Encode in UTF-8
    (3) Replace all reserved characters (space, %, etc.) with the three
    character string "%hh" (where hh is hex for the substituted character)
    (4) Now similarly replace all bytes > 0x7F with the three-character
    string "%hh" (where hh is hex for the substituted character)

    > But yes, {U+0131}{U+0307} can look awfully similar to
    > {U+0069}, I think {U+0069}
    > {U+0307} would as well (and of course there are other
    > opportunities for visual
    > confusion unrelated to the U+0069 and U+0131).

    Yeah, I thought of that. Yuk. The whole issue of spoof detection is an
    absolute nightmare. There are /some/ things you can do to help, though:.
    security-conscious applications could use fonts in which 0 looks
    different from O, and in which 1 looks different from l; different
    scripts could be displayed in different colors; a warning dialog could
    be presented to the user if any character is a compatibility character,
    and so on. But NONE of these tricks will catch the distinction between
    U+0069 and U+0307. Both are letters, both are in the Latin script,
    neither is a compatilibility character, etc.. Automation can only go so
    far. Eventually, you're left with only one choice - to advise the user:
    "Never click on a hyperlink. Instead, always type in the URL by hand".
    Trouble is, such advice is more trouble than it's worth, and would kill
    the fluidity of the internet.

    Jill



    This archive was generated by hypermail 2.1.5 : Tue Dec 16 2003 - 10:05:06 EST