From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Mar 24 2005 - 18:55:06 CST
> Are there any plans to gather info
> directly from the world's communities?
Yes, but sadly my message bounced, when I addressed it to
all-communities@world.org ;-)
More seriously, one of the steps I anticipate we should take is adding any
missing characters that have the Word_Break property values Katakana,
ALetter, and MidLetter (see
http://www.unicode.org/reports/tr29/tr29-8.html). That will pick up
characters that have already been reported to us as being needed in
orthographies. So it is characters outside of that list that are of concern.
Mark
----- Original Message -----
From: "Erik van der Poel" <erik@vanderpoel.org>
To: "Mark Davis" <mark.davis@jtcsv.com>
Cc: "Unicode Mailing List" <unicode@unicode.org>; "UnicoRe Mailing List"
<unicore@unicode.org>
Sent: Thursday, March 24, 2005 11:37
Subject: Re: Security Issues
> Hi Mark,
>
> I gather that you are asking for feedback regarding characters "required
> by the orthography of a modern language". One of the contexts being
> discussed is that of internationalized domain names (IDNs). I think it
> may be important to remember that the IDN specs are not only talking
> about matching strings, but also "inputting" (e.g. keyboard typing)
> strings. These days, people see domain names on the side of a bus, and
> then they try to go to that site by typing those characters.
>
> I already mentioned the potential occurrence of fullwidth Latin
> (U+FF21..) and halfwidth Katakana (U+FF65..) in Japanese input methods
> and that these are currently normalized by the IDN specs. However, I
> found a few others at the bottom of Japan's IDN table:
>
> http://www.iana.org/assignments/idn/jp-japanese.html
>
> I tried to look up U+2212 in your idn-chars.html file, but it was
> somewhat difficult. I ended up doing a View > Page Source followed by a
> Find, but it was difficult to see which section it belonged to. It would
> be nice if you could look up code points more easily. Anyway, U+2212
> belongs to Script Common, Non-ID. Given that the Japanese themselves are
> mentioning U+2212 as one of the characters involved in input methods in
> their IANA IDN registration, you may wish to consider it. U+2212 is not
> currently mapped or normalized in the IDN specs, but the Japanese appear
> to want it to be converted to U+FF0D before mapping/normalizing.
>
> Of course, I cannot speak for the Japanese. It seems to me that you need
> info from the people themselves. Are there any plans to gather info
> directly from the world's communities?
>
> Erik
>
>
This archive was generated by hypermail 2.1.5 : Thu Mar 24 2005 - 18:56:03 CST