Re: [Proposal] Extended UTF-16 by using Plane 14

From: Gary Roberts (gar@sandiegoca.ncr.com)
Date: Tue Apr 13 1999 - 22:30:23 EDT


I thought it would be useful to extol some of the virtues of the scheme
John Jenkins suggests, and expand on the idea a little. If one grabs some
private use characters from the BMP (how many depends upon how many
variants of the same character exist in your proposed script), you can
represent a glyph variant as two UTF-16 characters: one is the abstract
character that already exists in Unicode (or a private use character in
UTF-16 when the standard is missing a character it should include),
followed by the 'variant tag' private use character that designates the
exact glyph you desire.

The scheme should allow most characters to be represented with four bytes
(shorter than your proposed utf-16 modification), and all other characters
in six bytes. Another advantage is that a naive utf-16 display might
display sufficiently well for a document to still be legible (as the
utf-16 reader won't know that variant tags sould be displayed zero width,
it could get ugly, but a user might still figure out what is going
on).

In short, I think this scheme would be much better for the problem stated
than using ever more user defined characters that are essentially
uninterpretable by other systems.

                                *

On Tue, 13 Apr 1999, John Jenkins wrote:

> Christian Wittern <chris@ccbs.ntu.edu.tw> writes:
>
> > The characters in question have deliberately not undergone the unification
> > in question, since the preservation of the exact glyph shape is deemed of
> > interest. This again is a reason to use the private character area, and
> > again, it is a reason the number of characters needed might possibly exceed
> > 131000.
> >
>
> Then they don't want a character set, they want a glyph registry. If one
> really *does* want to create access to a vast number of glyphs through
> Unicode, then it would be best to take a CCCII-type approach and classify
> the glyphs into families representing the abstract character of which they
> are all variant appearances. You can then use a zero-width "variant tag" to
> distinguish them. This is an approach Apple is investigating as a means of
> helping solve the variant problem among ideographs.
>
>
> =====
> John H. Jenkins
> jenkins@apple.com
> tseng@blueneptune.com
> http://www.blueneptune.com/~tseng
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT