In a message dated 2001-02-16 0:19:01 Pacific Standard Time, email@example.com
> Because of the widespread belief that Unicode stops at U+FFFF,
> many fonts and applications that claim to support Unicode can
> only handle basic characters, not supplementary characters.
> Right. (Is it really a widespread belief? That's something I've
> been wondering.)
Well, firstname.lastname@example.org seems to think so:
> > Many descriptions on the Web erroneously claim that Unicode contains
> > first 64K characters of ISO 10646.
> Well, AFAICT it's true.
> At some point in the future I suppose it will cease to be true, but if you
> say "is" you should be talking about the present.
Unicode has been defined as ranging from U+0000 to U+10FFFF for several years
now. The fact that no characters have been assigned beyond U+FFFF before
Unicode 3.1 (which is still in beta) does not change this.
> > Because of the widespread belief that Unicode stops at U+FFFF, many
> > applications that claim to support Unicode can only handle basic
> > not supplementary characters.
> The code I wrote is like that, and it'll remain like that for as long as
> that's all that can be tested and used in real life.
You can already test private-use characters in the U+Fxxxx and U+10xxxx
ranges. Saying that your code shouldn't have to work with characters beyond
U+FFFF because no such characters have been assigned yet is like saying it
shouldn't have to support U+20B0 through U+20CF. You know characters will be
assigned to that range some day, possibly sooner than you think.
Back to email@example.com:
> So using the plain english term "basic" to describe that subset
> of Unicode is misleading.
> I agree with you that the language in the standard needs updating.
I think that has been tried already, and 'basic' was the best anyone could
do. Terms involving 'planes', such as 'BMP' and 'supplementary planes', are
discouraged because planes per se are not part of Unicode, only ISO/IEC 10646.
I personally don't like 'basic' and 'supplementary' because they seem to
imply that the first 64K code points are better in some way, but the most
important thing is that the terminology remain consistent, even if flawed.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT