Re: [unicode] CJK Ideograph Fragments

From: Uriah Eisenstein (
Date: Thu Apr 29 2010 - 18:19:37 CDT

  • Next message: "[Unicode Announcement] Unicode Releases Common Locale Data Repository, Version 1.8.1"

    I will try to clarify: I am well aware of the existence of many software
    projects using IDS, or other hierarchical structures, to describe
    ideographs. I assume they will all have to deal with cases where a component
    of an ideograph isn't itself a character. My question was specifically about
    the possibility of encoding such components, rather than leaving it to be
    resolved separately in each project (e.g. by using PUA characters,
    indicating variants or abbreviations of existing characters etc). I hope it
    is clearer now. If examples (websites of such projects) are desired, please
    let me know.

    I've recently been supplementing an existing list of decompositions for CJK
    Unified Ideographs, and have so far found several dozens of such repeating
    components which aren't encoded characters. This doesn't include the
    extension blocks, for which I don't have decomposition data (but have found
    quite a few components in). I would therefore estimate the number of
    components at several dozens or perhaps a few hundreds, depending on the
    policy of suggesting new ones.

    Uriah Eisenstein

    2010/4/29 <>

    > Hi,
    > I think you searched wrong place or wrong keyword.
    > 1. Recently JTC1/SC2/WG2/IRG checks if new proposed CJK Unified
    > Ideograph has similar shape with existing one, by comparing
    > IDS (a sequence of IDC, CJK Unified Ideograph and CJK Radicals).
    > 2. Wenlin Institute has a system to generate CJK Unified Ideograph
    > glyph from the sequence of IDC and CJK Strokes and extra positiong
    > informations, described by markup language CDL.
    > If you know them already and think your idea is different
    > from their activity, please clarify the difference. It will
    > help the reviewers to understand your future proposal to UTC.
    > How many components do you want to propose?
    > Regards,
    > mpsuzuki
    > On Wed, 28 Apr 2010 20:03:52 +0200
    > Uriah Eisenstein <> wrote:
    > >Hello,
    > >My question is about common components of CJK Ideographs which are not
    > >encoded as independent Han characters (and perhaps indeed aren't). A good
    > >example is the right-hand part of the character 漢 itself: it is a distinct
    > >component appearing in multiple other characters, but is not encoded to
    > the
    > >best of my knowledge. The same goes for the top part of 鳥 and 島, the
    > >surrounding part of 與 and 興 and several others. My question is whether
    > there
    > >are any plans or discussions for encoding these fragments in Unicode.
    > >
    > >(I haven't found anything about this in mailing list archives; I did find
    > >statements that Unicode does not intend to provide any decomposition data
    > of
    > >Han characters :) And for good reasons. However, such fragments may well
    > be
    > >useful for third-party software dealing with 漢字 glyph generation, lookup
    > by
    > >components etc.)
    > >
    > >Thanks,
    > >Uriah Eisenstein
    > >

    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2010 - 18:25:37 CDT