Re: Taboo Variants

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Fri Aug 09 2002 - 13:38:07 EDT


"John H. Jenkins" wrote:

>
> Yes, because you do not *encode* characters using IDC's. You describe
> them. This is carefully explained in the standard.

I stand corrected.

>
> Of course, using the taboo variant selector is about as vague as an
> IDC, so it doesn't make that much difference.

My point is that if the commonly encountered taboo variants are already encoded in CJK-B, then
either the other taboo variants should also be added to CJK-B or they could be *described* using
IDCs. Adding a taboo variant selector does make a difference, because then there'll be more than one
way to reference the same character.

On the other hand, given the lack of font support for CJK-B, perhaps a taboo variant selector would
be preferable ... now I don't where I stand on this !

>
> As to the proposed location, note that the byte-order mark got stuck
> with a bunch of Arabic compatibility forms.

U+FEFF is only stuck with a bunch of Arabic compatibility forms because it's the little-endian of
U+FFFE, and as far as I'm aware it's not actually a BOM character, but a code point that is "used
solely with the semantic of BOM" (TR28 Section 3.9).

> Sometimes the odd
> character gets stuck in an odd place; as you say, there wasn't any room
> left in the more logical location, and this spot in the KangXi radicals
> block was pretty much never going to be used otherwise. Six of one, as
> it were.
>

I simply can't accept this.

For argument's sake, what are you going to do when I publish the manuscript copy of a draft edition
of the Kangxi dictionary that I recently purchased in a second-hand bookstore in London that
includes ten supplementary radicals not found in the printed editions ?

In principle, as has been argued convincingly in another thread recently, you can never assume that
any unused code point will always remain vacant. The Kangxi Radical block may look as if it will
never change, but we shouldn't rely on that being the case.

Given that there's going to be proposals for additional CJK symbols and punctuation marks in the
future (if no-one else does I've got a few I'll propose), surely it would be better to simply create
a "CJK Symbols and Punctuation B" block for the proposed IDEOGRAPHIC TABOO VARIATION INDICATOR. It's
irrelevant that the block will only have one charcacter to start with. It's got to be better than
poluting other blocks with characters that just don't belong there.

Andrew



This archive was generated by hypermail 2.1.2 : Fri Aug 09 2002 - 11:49:27 EDT