Re: Unicode abuse

From: Erik van der Poel (erik@vanderpoel.org)
Date: Mon Mar 07 2005 - 01:19:54 CST

  • Next message: Antoine Leca: "Re: Languages using multiple scripts"

    Doug Ewell wrote:
    > I agree that ℂ is unnecessary in domain names. I also feel that it's
    > harmless.

    It certainly doesn't seem like a big problem at first glance. But, in
    networking, it is sometimes bad to have too much flexibility. It has a
    way of coming back to haunt you later.

    > There are hundreds, if not thousands, of characters like this
    > in Unicode.

    And that's supposed to make me feel better?

    > Accepting NFKC lock, stock and barrel for this purpose is probably
    > better than second-guessing the work of the UTC.

    Actually, the Unicode database does contain some info that we might be
    able to use in Nameprep. Double-struck C has the tag <font> in its
    Character Decomposition Mapping, while the Japanese wide characters have
    the tag <wide>. So Nameprep could choose to normalize just the <wide>
    characters and maybe some of the others, but not <font>. For domain
    names, that may be the way to go.

    Erik



    This archive was generated by hypermail 2.1.5 : Mon Mar 07 2005 - 01:21:32 CST