Re: Security Issues

From: Erik van der Poel (erik@vanderpoel.org)
Date: Thu Mar 24 2005 - 13:37:17 CST

  • Next message: John Hudson: "Re: 'lower case a' and 'script a' in unicode"

    Hi Mark,

    I gather that you are asking for feedback regarding characters "required
    by the orthography of a modern language". One of the contexts being
    discussed is that of internationalized domain names (IDNs). I think it
    may be important to remember that the IDN specs are not only talking
    about matching strings, but also "inputting" (e.g. keyboard typing)
    strings. These days, people see domain names on the side of a bus, and
    then they try to go to that site by typing those characters.

    I already mentioned the potential occurrence of fullwidth Latin
    (U+FF21..) and halfwidth Katakana (U+FF65..) in Japanese input methods
    and that these are currently normalized by the IDN specs. However, I
    found a few others at the bottom of Japan's IDN table:

    http://www.iana.org/assignments/idn/jp-japanese.html

    I tried to look up U+2212 in your idn-chars.html file, but it was
    somewhat difficult. I ended up doing a View > Page Source followed by a
    Find, but it was difficult to see which section it belonged to. It would
    be nice if you could look up code points more easily. Anyway, U+2212
    belongs to Script Common, Non-ID. Given that the Japanese themselves are
    mentioning U+2212 as one of the characters involved in input methods in
    their IANA IDN registration, you may wish to consider it. U+2212 is not
    currently mapped or normalized in the IDN specs, but the Japanese appear
    to want it to be converted to U+FF0D before mapping/normalizing.

    Of course, I cannot speak for the Japanese. It seems to me that you need
    info from the people themselves. Are there any plans to gather info
    directly from the world's communities?

    Erik



    This archive was generated by hypermail 2.1.5 : Thu Mar 24 2005 - 13:38:05 CST