Re: Security Issues

From: Mark Davis (
Date: Thu Mar 24 2005 - 18:55:06 CST

  • Next message: Peter Kirk: "Re: 'lower case a' and 'script a' in unicode"

    > Are there any plans to gather info
    > directly from the world's communities?

    Yes, but sadly my message bounced, when I addressed it to ;-)

    More seriously, one of the steps I anticipate we should take is adding any
    missing characters that have the Word_Break property values Katakana,
    ALetter, and MidLetter (see That will pick up
    characters that have already been reported to us as being needed in
    orthographies. So it is characters outside of that list that are of concern.


    ----- Original Message -----
    From: "Erik van der Poel" <>
    To: "Mark Davis" <>
    Cc: "Unicode Mailing List" <>; "UnicoRe Mailing List"
    Sent: Thursday, March 24, 2005 11:37
    Subject: Re: Security Issues

    > Hi Mark,
    > I gather that you are asking for feedback regarding characters "required
    > by the orthography of a modern language". One of the contexts being
    > discussed is that of internationalized domain names (IDNs). I think it
    > may be important to remember that the IDN specs are not only talking
    > about matching strings, but also "inputting" (e.g. keyboard typing)
    > strings. These days, people see domain names on the side of a bus, and
    > then they try to go to that site by typing those characters.
    > I already mentioned the potential occurrence of fullwidth Latin
    > (U+FF21..) and halfwidth Katakana (U+FF65..) in Japanese input methods
    > and that these are currently normalized by the IDN specs. However, I
    > found a few others at the bottom of Japan's IDN table:
    > I tried to look up U+2212 in your idn-chars.html file, but it was
    > somewhat difficult. I ended up doing a View > Page Source followed by a
    > Find, but it was difficult to see which section it belonged to. It would
    > be nice if you could look up code points more easily. Anyway, U+2212
    > belongs to Script Common, Non-ID. Given that the Japanese themselves are
    > mentioning U+2212 as one of the characters involved in input methods in
    > their IANA IDN registration, you may wish to consider it. U+2212 is not
    > currently mapped or normalized in the IDN specs, but the Japanese appear
    > to want it to be converted to U+FF0D before mapping/normalizing.
    > Of course, I cannot speak for the Japanese. It seems to me that you need
    > info from the people themselves. Are there any plans to gather info
    > directly from the world's communities?
    > Erik

    This archive was generated by hypermail 2.1.5 : Thu Mar 24 2005 - 18:56:03 CST