From: Erik van der Poel (erik@vanderpoel.org)
Date: Thu Mar 24 2005 - 13:37:17 CST
Hi Mark,
I gather that you are asking for feedback regarding characters "required
by the orthography of a modern language". One of the contexts being
discussed is that of internationalized domain names (IDNs). I think it
may be important to remember that the IDN specs are not only talking
about matching strings, but also "inputting" (e.g. keyboard typing)
strings. These days, people see domain names on the side of a bus, and
then they try to go to that site by typing those characters.
I already mentioned the potential occurrence of fullwidth Latin
(U+FF21..) and halfwidth Katakana (U+FF65..) in Japanese input methods
and that these are currently normalized by the IDN specs. However, I
found a few others at the bottom of Japan's IDN table:
http://www.iana.org/assignments/idn/jp-japanese.html
I tried to look up U+2212 in your idn-chars.html file, but it was
somewhat difficult. I ended up doing a View > Page Source followed by a
Find, but it was difficult to see which section it belonged to. It would
be nice if you could look up code points more easily. Anyway, U+2212
belongs to Script Common, Non-ID. Given that the Japanese themselves are
mentioning U+2212 as one of the characters involved in input methods in
their IANA IDN registration, you may wish to consider it. U+2212 is not
currently mapped or normalized in the IDN specs, but the Japanese appear
to want it to be converted to U+FF0D before mapping/normalizing.
Of course, I cannot speak for the Japanese. It seems to me that you need
info from the people themselves. Are there any plans to gather info
directly from the world's communities?
Erik
This archive was generated by hypermail 2.1.5 : Thu Mar 24 2005 - 13:38:05 CST