> Ar 18:02 -0800 1998-11-11, scríobh Keld J|rn Simonsen:
> >> >Java is also going to get problems: "\u10208" would be mistaken as
> >> >U+1020 <undefined Mongolian character> U+0038 DIGIT EIGHT instead
> >> >of U-00010208 ETRUSCAN LETTER TH.
> >> \uD800\uDE08 is an obvious answer for Java, since Java's 16-bit data
> >> type implies its use of UTF-16.
> >Yoou should not use \uxxxx nothation for surrogates,
> >as surrogates are not charcters in neither Unicode nor 10646,
> >and thus the short identifiers cannot be used.
> WG2 has provisionally accepted and provisionally allocated Etruscan,
> Gothic, Western Musical Symbols, and Byzantine Musical Symbols to Plane 1.
> Yes, it hasn't been published or ballotted or anything, but one has to have
> a way of referring to those (provisional) code positions.
Yes, there should be identifiers, and those are
already at least in 10646: such as U00010208 for the
mentioned etruscan letter. I was merely objecting
to the Unicode suggestion to use character
identifier identifcations for things that are not
characters, that is the unicode "surrogates".
I think it is OK to use the code points for thse, but not
short character IDs as that is really messing up
with the concepts of UCS (at least 10646).
Unicode should introduce a way to identify the
characters of planes outside BMP, which is consistent
with Amd 9.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT