Re: Application that displays CJK text in Normalization Form D

From: Doug Ewell (doug@ewellic.org)
Date: Sun Nov 14 2010 - 19:59:17 CST

Next message: Peter Constable: "RE: Application that displays CJK text in Normalization Form D"

Previous message: Jim Monty: "Application that displays katakana and Hangul text in Normalization Form D [Was Re: Application that displays CJK text in Normalization Form D] :-)"
In reply to: Asmus Freytag: "Re: Application that displays CJK text in Normalization Form D"
Next in thread: Jim Monty: "Re: Application that displays CJK text in Normalization Form D"
Reply: Jim Monty: "Re: Application that displays CJK text in Normalization Form D"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Asmus Freytag <asmusf at ix dot netcom dot com> wrote:

>> The term "CJK" is often used to refer to those characters which are
>> common to Chinese and Japanese and Korean, viz. the ideographic
>> characters.
>
> Doug,
>
> you might want to talk to the author of UTN#14 then, because he seems
> to be using the term "CJK text" in a sense that I find
> indistinguishable from the way Jim did.
>
> Any relation of yours?

Nice catch. In UTN #14, I wrote:

> In the case of Chinese, Japanese, and Korean (“CJK”) text, where a
> typical document might contain thousands of different ideographic Han
> characters, there never was any expectation that 8 bits per character
> would suffice. The legacy double-byte character sets designed for CJK
> text used a single byte for some characters (ASCII and halfwidth
> katakana) and two for others. DBCS encodings are trickier to handle
> than fixed-length encodings—programmers must keep track of lead and
> trail bytes—but at least these character sets represented CJK text in
> no more than 16 bits, as compactly as could be expected.

By "CJK text" I definitely did mean to emphasize the unique situation of
having to find room for thousands of ideographic characters. I note
that legacy character sets (primarily EBCDIC-based) have been devised to
handle only Latin plus katakana, or only Latin plus jamos, such that 8
bits per character did in fact suffice.

In my second sentence above, I did acknowledge that "double-byte
character sets designed for CJK text" include halfwidth katakana. For
that matter, many of them also include Greek and Cyrillic, so I'm not
sure if the comparison to Jim's usage is quite on the mark, but I'll
accept it if Asmus sees it that way.

The answer to Jim's question, then, is that for those examples of "CJK
text" which are encoded differently in NFC and NFD (a group that
excludes ideographs, thus immediately putting that side issue to rest),
there are indeed some combinations of OS + app + rendering engine + font
that can display those examples properly.

And no, I did not intend to make this big a deal out of it, and I
apologize for doing so.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Peter Constable: "RE: Application that displays CJK text in Normalization Form D"
Previous message: Jim Monty: "Application that displays katakana and Hangul text in Normalization Form D [Was Re: Application that displays CJK text in Normalization Form D] :-)"
In reply to: Asmus Freytag: "Re: Application that displays CJK text in Normalization Form D"
Next in thread: Jim Monty: "Re: Application that displays CJK text in Normalization Form D"
Reply: Jim Monty: "Re: Application that displays CJK text in Normalization Form D"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Nov 14 2010 - 20:02:22 CST