Re: Transcriptions of "Unicode"

From: James Kass (jameskass@worldnet.att.net)
Date: Thu Dec 07 2000 - 03:27:11 EST


Kenneth Whistler wrote:

>
> The CJK radicals supplement, U+2E80..U+2EF3, are the ones that
> show a number of specific forms, but those are intended for
> special text purposes, as when specifying a radical index in
> a dictionary.
>

The same reasoning could be said to apply to the variant
characters formed with those variant radicals. Or maybe I
am misunderstanding this, special text purposes will permit
the explicit encoding of a variant radical in a dictionary's
index, but the dictionary can't explicitly encode the characters
to which that index refers (?)

> > Since anyone encoding U+9F52 might see any of the above
> > three versions, my opinion is that encoders (authors) would
> > wish to explicitly encode their expected character and would
> > do so whenever they have the option.
>
> First of all, you missed the simplified version of 'teeth'
> at U+9F7F. If someone explicitly wants the (Chinese) simplified
> version, of course they should use that, and not U+2EEE, for
> heaven's sake.
>

Yes, and I also missed the other variant pointed out by Thomas
Chan. Shame on me.

> > I believe that they
> > should have the option. The abundance of unassigned code
> > points offered by additional Unicode planes makes this
> > possible and would eliminate the need for a browser
> > (or any other application) to "guess" a language in order
> > to display material as its authors and users desire.
>
> I think you are way overstating the scope of the problem.
> Browsers can meet most of their users expectations merely by
> having their Unicode font set to a Japanese font or a Chinese
> font, as desired. It is only for fine control of mixed
> language data that you may need more, and for that, it is
> not unreasonable to expect that people will require language
> and font markup.
>
> I consider it pernicious to be suggesting that things would
> be better if we just gave up on unification and encoded all
> the glyphs. You might make things a little easier on the
> rendering end (although the fonts would keep growing), but
> the resultant problems of text equivalence for searching and
> other text processes would just get much worse than it already
> is for Han characters.
>

Pernicious is bit strong, perhaps. I was implying that things
might be better if we encoded all the characters, and
suggesting that this is already being done. You've lost me on
the last part. I do not think the font's size would be much
affected because a Chinese font isn't going to have Japanese
variants anyway, regardless of the Japanese variants encoding
status. An OpenType font including both Japanese and Chinese
variants will already have to have glyphs for both characters
whether or not code points have been assigned for both
characters. Indeed, an OpenType font for both Japanese and
Chinese variants may well be a bit smaller than otherwise, as
there will be fewer look-up tables if the characters had
explicit code points.

With regards to text searching and the like, Erik van der Poel
has already responded with an example of a Chinese author
sending a Chinese e-mail to a Japanese reader who knows Chinese.
To this I wish to add that someone searching Japanese text for
Japanese strings might not be able to use any results which
come up in Chinese, even though educated Japanese can
generally puzzle-out the gist of Chinese text there are
grammatical differences that go beyond the writing system
differences. As for Chinese users searching for Chinese
strings, Japanese text will most probably be incomprehensible
regardless of font or mark-up. Since you already know this,
I'm afraid I'm misunderstanding the stated problem.

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT