Date: Thu Oct 25 2007 - 20:34:50 CDT
the exact set of plane 2 characters of course depends on the context
one is talking about, however appliacations need to be able to suport
planes 1-16. The most obvious set are the Cantonese characters found
in plane 2. However various books and even newspapers often require
characters in plane 2.
I aaware that the original aim of unicode was to have all 'useful'
characters in the BMP. However as far as CJKV characters are concerned
this has not been done, rather characters have been added on a first
come first serve basis. If the allocation of CJKV codepoints continues
to be donr in this way, then for modern CJKV coverage will require not
only BMP and plane 1 support but also, in the future, plane 3 suport.
Plane 2 includes various Cantonese characters, and as yet unencoded
include a large number of place names, any already submitted to the
IRG should end up in plane 2, however any submitted in the future
could well be in plane 3. Not to mention characters used by 'small'
communities such as the Zhuang with a population over 10 million.
There are two slightly different questions here:-
(1) What characters a font should include:-
If one in a font has a limited number of cjk glyphs that can be used,
in this case one chooses the most useful characters (ttf files limit
to 65536 glyphs). On even simple one has to decide what order to make
cjk glyphs in. One example making useful characters first is
uming.ttf, which includes quite a number of plane 2 characters, but
not full Extension A support.
In pratice modern dictionaries, designed for high school/college level
students tend to include about 20 to 25 thousand characters, however
different regions use some different characters, so one could argue
over 30 thousand chracters are required as a minimum.
(2) What a features should an application support.
IMHO applications need to support surrogates in this day and age. For
example, for one project I used perl Tk however I discovered too late
perl Tk does not support surrogates. A difference in this case between
being an application that is widely used and a dead end. I would there
urge all developers to include surrogate support in the core features
of their applications.
What other modern languages apart for cjkv require sopport beyond the BMP?
Quoting Peter Constable <firstname.lastname@example.org>:
> From: email@example.com [mailto:firstname.lastname@example.org]
> On Behalf Of email@example.com
>> You certianly support for plane 2 characters, some really obsurce
>> Chinese characters are in the BMP, but some very useful ones are in
>> plane 2.
> I wonder if you could elaborate. We hear that CJK users typically
> use well under 10K characters, and for years there have been
> implementations using character sets that didn't include any of the
> Plane 2 characters and that, evidently, were adequate for lots of
> usage. So, it's not obvious that Plane 2 characters would be needed
> in all application scenarios. (Of course, Tim hasn't really said
> much about his application scenario.) I do note that the II Core set
> includes 22 Plane 2 characters; are these the characters you had in
> mind? In what scenarios is it important to support them?
This message sent through Virus Free Email
This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 20:38:14 CDT