Re: Application that displays CJK text in Normalization Form D

From: Doug Ewell (doug@ewellic.org)
Date: Sun Nov 14 2010 - 19:59:17 CST

  • Next message: Peter Constable: "RE: Application that displays CJK text in Normalization Form D"

    Asmus Freytag <asmusf at ix dot netcom dot com> wrote:

    >> The term "CJK" is often used to refer to those characters which are
    >> common to Chinese and Japanese and Korean, viz. the ideographic
    >> characters.
    >
    > Doug,
    >
    > you might want to talk to the author of UTN#14 then, because he seems
    > to be using the term "CJK text" in a sense that I find
    > indistinguishable from the way Jim did.
    >
    > Any relation of yours?

    Nice catch. In UTN #14, I wrote:

    > In the case of Chinese, Japanese, and Korean (“CJK”) text, where a
    > typical document might contain thousands of different ideographic Han
    > characters, there never was any expectation that 8 bits per character
    > would suffice. The legacy double-byte character sets designed for CJK
    > text used a single byte for some characters (ASCII and halfwidth
    > katakana) and two for others. DBCS encodings are trickier to handle
    > than fixed-length encodings—programmers must keep track of lead and
    > trail bytes—but at least these character sets represented CJK text in
    > no more than 16 bits, as compactly as could be expected.

    By "CJK text" I definitely did mean to emphasize the unique situation of
    having to find room for thousands of ideographic characters. I note
    that legacy character sets (primarily EBCDIC-based) have been devised to
    handle only Latin plus katakana, or only Latin plus jamos, such that 8
    bits per character did in fact suffice.

    In my second sentence above, I did acknowledge that "double-byte
    character sets designed for CJK text" include halfwidth katakana. For
    that matter, many of them also include Greek and Cyrillic, so I'm not
    sure if the comparison to Jim's usage is quite on the mark, but I'll
    accept it if Asmus sees it that way.

    The answer to Jim's question, then, is that for those examples of "CJK
    text" which are encoded differently in NFC and NFD (a group that
    excludes ideographs, thus immediately putting that side issue to rest),
    there are indeed some combinations of OS + app + rendering engine + font
    that can display those examples properly.

    And no, I did not intend to make this big a deal out of it, and I
    apologize for doing so.

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Sun Nov 14 2010 - 20:02:22 CST