Re: Security Issues

From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Mar 24 2005 - 18:55:06 CST

  • Next message: Peter Kirk: "Re: 'lower case a' and 'script a' in unicode"

    > Are there any plans to gather info
    > directly from the world's communities?

    Yes, but sadly my message bounced, when I addressed it to
    all-communities@world.org ;-)

    More seriously, one of the steps I anticipate we should take is adding any
    missing characters that have the Word_Break property values Katakana,
    ALetter, and MidLetter (see
    http://www.unicode.org/reports/tr29/tr29-8.html). That will pick up
    characters that have already been reported to us as being needed in
    orthographies. So it is characters outside of that list that are of concern.

    ‚ÄéMark

    ----- Original Message -----
    From: "Erik van der Poel" <erik@vanderpoel.org>
    To: "Mark Davis" <mark.davis@jtcsv.com>
    Cc: "Unicode Mailing List" <unicode@unicode.org>; "UnicoRe Mailing List"
    <unicore@unicode.org>
    Sent: Thursday, March 24, 2005 11:37
    Subject: Re: Security Issues

    > Hi Mark,
    >
    > I gather that you are asking for feedback regarding characters "required
    > by the orthography of a modern language". One of the contexts being
    > discussed is that of internationalized domain names (IDNs). I think it
    > may be important to remember that the IDN specs are not only talking
    > about matching strings, but also "inputting" (e.g. keyboard typing)
    > strings. These days, people see domain names on the side of a bus, and
    > then they try to go to that site by typing those characters.
    >
    > I already mentioned the potential occurrence of fullwidth Latin
    > (U+FF21..) and halfwidth Katakana (U+FF65..) in Japanese input methods
    > and that these are currently normalized by the IDN specs. However, I
    > found a few others at the bottom of Japan's IDN table:
    >
    > http://www.iana.org/assignments/idn/jp-japanese.html
    >
    > I tried to look up U+2212 in your idn-chars.html file, but it was
    > somewhat difficult. I ended up doing a View > Page Source followed by a
    > Find, but it was difficult to see which section it belonged to. It would
    > be nice if you could look up code points more easily. Anyway, U+2212
    > belongs to Script Common, Non-ID. Given that the Japanese themselves are
    > mentioning U+2212 as one of the characters involved in input methods in
    > their IANA IDN registration, you may wish to consider it. U+2212 is not
    > currently mapped or normalized in the IDN specs, but the Japanese appear
    > to want it to be converted to U+FF0D before mapping/normalizing.
    >
    > Of course, I cannot speak for the Japanese. It seems to me that you need
    > info from the people themselves. Are there any plans to gather info
    > directly from the world's communities?
    >
    > Erik
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Mar 24 2005 - 18:56:03 CST