Re: CJK Ideograph Fragments

From: John H. Jenkins (
Date: Wed Apr 28 2010 - 14:36:53 CDT

  • Next message: Kenneth Whistler: "Re: [indic] Halant - can it be called a "Linguistic Zero" (Panini)?"

    No. You could certainly write up a proposal and submit it to the UTC. Should the UTC feel the idea has merit, it would then move it on to WG2 and/or the IRG.

    The main problem here is that there is a very strong desire to limit ideograph encoding to attested and documentable forms. Anything which does not exist in actual texts is not likely to be well-regarded. Similarly, the UTC has a strong preference not to encoding anything which isn't in actual use. Proposals to encode characters because they would be useful if encoded even though they aren't actually being used right now are generally looked on with disfavor.

    b Apr 28, 2010 12:03 PM ɡA Uriah Eisenstein gG

    > Hello,
    > My question is about common components of CJK Ideographs which are not encoded as independent Han characters (and perhaps indeed aren't). A good example is the right-hand part of the character ~ itself: it is a distinct component appearing in multiple other characters, but is not encoded to the best of my knowledge. The same goes for the top part of and q, the surrounding part of P and and several others. My question is whether there are any plans or discussions for encoding these fragments in Unicode.
    > (I haven't found anything about this in mailing list archives; I did find statements that Unicode does not intend to provide any decomposition data of Han characters :) And for good reasons. However, such fragments may well be useful for third-party software dealing with ~r glyph generation, lookup by components etc.)
    > Thanks,
    > Uriah Eisenstein

    This archive was generated by hypermail 2.1.5 : Wed Apr 28 2010 - 14:38:20 CDT