Re: Application that displays CJK text in Normalization Form D

From: Jim Monty (
Date: Mon Nov 15 2010 - 15:01:37 CST

  • Next message: Doug Ewell: "RE: Application that displays CJK text in Normalization Form D"

    Doug Ewell wrote:
    > And no, I did not intend to make this big a deal out of it, and I
    > apologize for doing so.

    Nor did I.

    I'm a genuine student of Unicode, here to learn. It seems many of the regular
    contributors to the Unicode and Unicore mailing lists are the Unicode experts
    themselves, many of whom are developers of the Unicode Standard. As such, these
    mailing lists are fantastic! There are very few technology mailing lists
    like them anymore. How cool is it to post an inquiry to the Unicode mailing
    list and have Unicode luminaries like Mark Davis, Asmus Freytag, Markus Scherer,
    Martin Dürst and Doug Ewell ALL reply? (The answer: Pretty darn cool!)

    When I asked for clarification about my use of the term "CJK text" instead of
    "kana and Hangul text", I was earnest. If there was something wrong with my
    understanding of the standard terminology, I genuinely wanted to know what it
    was. You're the experts, I'm the initiate.

    > The answer to Jim's question, then, is that for those examples
    > of "CJK text" which are encoded differently in NFC and NFD (a group
    > that excludes ideographs, thus immediately putting that side issue
    > to rest), there are indeed some combinations of OS + app + rendering
    > engine + font that can display those examples properly.

    And this was the valuable lesson I learned. Until this exchange on the Unicode
    mailing, I'd had a biased and wrong impression of the state of the art with
    respect to Unicode normalization and modern software based on my own personal
    experience. I'm glad I asked the question, and I'm grateful for all the
    excellent and thorough answers.

    When I type the ideograph 漢 (U+FA47) into BabelPad, highlight it, and then click
    the button labeled "Normalize to NFC", the character becomes 漢 (U+6F22). Does
    BabelPad not conform to the Unicode Standard in this case? Is this not truly
    Unicode normalization?

    Jim Monty

    This archive was generated by hypermail 2.1.5 : Mon Nov 15 2010 - 15:05:58 CST