Re: CJK Ideograph Fragments

From: Asmus Freytag (
Date: Sat May 08 2010 - 15:40:13 CDT

  • Next message: Mark Davis ☕: "Re: CJK Ideograph Fragments"

    On 5/8/2010 11:44 AM, Uriah Eisenstein wrote:
    > Well,
    > I've gone through the policies of submitting new characters and
    > scripts and they don't look encouraging :) But neither do they seem to
    > reject the idea of character fragments out of hand, as opposed to the
    > reverse case - characters which can be expressed using existing
    > characters and combining marks. In fact, the CJK Radicals Supplement
    > block and the Hangul Jamo both contain character fragments, in a way.
    > But I suppose these already existed in national standards rather than
    > introduced by Unicode.
    > In any case, examples I've seen of proposals cite experts and provide
    > font makers, neither of whom I have contact with. So I guess I'll drop
    > it for now, and hope that if someone takes it up I'll see it on the
    > mailing list.
    While a font is ultimately required for a proposal to become adopted, it
    shouldn't be a bar to formally raising the issue for initial
    consideration. Oncesomething is considered potentially acceptable,
    there's enough time to come up with fonts (for the purpose of printing
    charts) before the committees need to vote on final approval. Proposals
    can take years from initial consideration to publication....

    Your suggestion was that these fragments need to be enumerated for
    various purposes in software and that having a standard enumeration is
    beneficial. If you can document and support that assertion, I would
    encourage you to put it on record.

    Doing so would allow a discussion of whether a standard enumeration is
    indeed useful enough to encur the cost of standardization.

    In some ways, this would not be a run-of-the-mill character encoding
    proposal, because you are not asserting that these fragments need
    encoding for the purpose of directly expressing text. While that is the
    primary purpose of character encoding, there are purposes that are
    ancillary to this, that a universal character encoding such as Unicode
    must encompass.

    There is certainly some precedent for character codes that aren't
    limited to the primary purpose I mentioned, but, because they don't
    represent a standard situation, one needs to carefully argue why such
    uses need to be covered by standardization and if so, why doing that as
    character codes is appropriate.

    That is different from the more usual task to document that an entity
    occurs in written or printed documents.

    The problem is, unless you actually put down all the details in a
    coherent proposal it's hard to judge correctly what the situation is.
    When you raise the question informally, all anyone can tell you is that
    an exceptional request is one that needs exceptional justification,
    which, while certainly correct, doesn't exacatly help you or anyone to
    evaluate whether your proposal would meet the required level and type of

    > Thanks,
    > Uriah
    > On Sun, May 2, 2010 at 3:06 PM, Uriah Eisenstein
    > < <>> wrote:
    > Not exactly, but I suppose such Hanzi fragments could be sued for
    > similar purposes - e.g. looking up characters by components, where
    > the available components may include non-character fragments. Some
    > fragments may be useful for IME purposes, but probably not all.
    > On Sat, May 1, 2010 at 8:57 PM, Edward Cherlin <
    > <>> wrote:
    > 2010/4/28 John H. Jenkins <
    > <>>:
    > > No. You could certainly write up a proposal and submit it
    > to the UTC.
    > > Should the UTC feel the idea has merit, it would then move
    > it on to WG2
    > > and/or the IRG.
    > > The main problem here is that there is a very strong desire
    > to limit
    > > ideograph encoding to attested and documentable forms.
    > Anything which does
    > > not exist in actual texts is not likely to be well-regarded.
    > I had the idea some years ago of writing up a proposal to
    > encode the
    > hanzi fragments used in Cangjie Shurufa IMEs. These fragments
    > are used
    > extensively in dozens of howto books on keyboarding in
    > Cangjie. This
    > includes the pieces (mostly real characters, with some
    > radicals) used
    > on keyboard labels, and the common forms they stand for. I
    > didn't get
    > any interest from the Cangjie development community or the
    > authors of
    > a book on Cangjie that I have, so i abandoned the idea.
    > Uriah, is this the sort of thing you have in mind?
    > > Similarly, the
    > > UTC has a strong preference not to encoding anything which
    > isn't in actual
    > > use. Proposals to encode characters because they would be
    > useful if encoded
    > > even though they aren't actually being used right now are
    > generally looked
    > > on with disfavor.
    > >
    > > 在 Apr 28, 2010 12:03 PM 時, Uriah Eisenstein 寫到:
    > >
    > > Hello,
    > > My question is about common components of CJK Ideographs
    > which are not
    > > encoded as independent Han characters (and perhaps indeed
    > aren't). A good
    > > example is the right-hand part of the character 漢 itself:
    > it is a distinct
    > > component appearing in multiple other characters, but is not
    > encoded to the
    > > best of my knowledge. The same goes for the top part of 鳥
    > and 島, the
    > > surrounding part of 與 and 興 and several others. My
    > question is whether there
    > > are any plans or discussions for encoding these fragments in
    > Unicode.
    > >
    > > (I haven't found anything about this in mailing list
    > archives; I did find
    > > statements that Unicode does not intend to provide any
    > decomposition data of
    > > Han characters :) And for good reasons. However, such
    > fragments may well be
    > > useful for third-party software dealing with 漢字 glyph
    > generation, lookup by
    > > components etc.)
    > >
    > > Thanks,
    > > Uriah Eisenstein
    > >
    > >
    > --
    > Edward Mokurai (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) Cherlin
    > Silent Thunder is my name, and Children are my nation.
    > The Cosmos is my dwelling place, the Truth my destination.

    This archive was generated by hypermail 2.1.5 : Sat May 08 2010 - 15:43:58 CDT