Re: CJK Ideograph Fragments

From: Uriah Eisenstein (
Date: Mon May 10 2010 - 07:32:25 CDT

  • Next message: Uriah Eisenstein: "Re: CJK Ideograph Fragments"

    Thank you for the detailed answer, Mr. Freytag, I will consider then
    submitting at least an initial proposal (will probably take a few weeks).
    I'll try to contact participants in some projects which make use of
    character decompositions; although, I need to think if such character
    fragments would be useful in themselves for exchange of information, rather
    than functioning as convenient building components for other characters.
    Is there anywhere I could find the justifications for adding the CJK Radical
    Supplement characters, or were these incorporated into Unicode as part of
    previously-existing standards?
    Also, are the IDSs used internally by the IRG available anywhere public? I
    know these are not an official part of the Unicode standard, but they would
    make a nice use case :)


    On Sat, May 8, 2010 at 11:40 PM, Asmus Freytag <> wrote:

    > On 5/8/2010 11:44 AM, Uriah Eisenstein wrote:
    >> Well,
    >> I've gone through the policies of submitting new characters and scripts
    >> and they don't look encouraging :) But neither do they seem to reject the
    >> idea of character fragments out of hand, as opposed to the reverse case -
    >> characters which can be expressed using existing characters and combining
    >> marks. In fact, the CJK Radicals Supplement block and the Hangul Jamo both
    >> contain character fragments, in a way. But I suppose these already existed
    >> in national standards rather than introduced by Unicode.
    >> In any case, examples I've seen of proposals cite experts and provide font
    >> makers, neither of whom I have contact with. So I guess I'll drop it for
    >> now, and hope that if someone takes it up I'll see it on the mailing list.
    > While a font is ultimately required for a proposal to become adopted, it
    > shouldn't be a bar to formally raising the issue for initial consideration.
    > Oncesomething is considered potentially acceptable, there's enough time to
    > come up with fonts (for the purpose of printing charts) before the
    > committees need to vote on final approval. Proposals can take years from
    > initial consideration to publication....
    > Your suggestion was that these fragments need to be enumerated for various
    > purposes in software and that having a standard enumeration is beneficial.
    > If you can document and support that assertion, I would encourage you to put
    > it on record.
    > Doing so would allow a discussion of whether a standard enumeration is
    > indeed useful enough to encur the cost of standardization.
    > In some ways, this would not be a run-of-the-mill character encoding
    > proposal, because you are not asserting that these fragments need encoding
    > for the purpose of directly expressing text. While that is the primary
    > purpose of character encoding, there are purposes that are ancillary to
    > this, that a universal character encoding such as Unicode must encompass.
    > There is certainly some precedent for character codes that aren't limited
    > to the primary purpose I mentioned, but, because they don't represent a
    > standard situation, one needs to carefully argue why such uses need to be
    > covered by standardization and if so, why doing that as character codes is
    > appropriate.
    > That is different from the more usual task to document that an entity
    > occurs in written or printed documents.
    > The problem is, unless you actually put down all the details in a coherent
    > proposal it's hard to judge correctly what the situation is. When you raise
    > the question informally, all anyone can tell you is that an exceptional
    > request is one that needs exceptional justification, which, while certainly
    > correct, doesn't exacatly help you or anyone to evaluate whether your
    > proposal would meet the required level and type of justification.
    > A./
    >> Thanks,
    >> Uriah
    >> On Sun, May 2, 2010 at 3:06 PM, Uriah Eisenstein <
    >> <>> wrote:
    >> Not exactly, but I suppose such Hanzi fragments could be sued for
    >> similar purposes - e.g. looking up characters by components, where
    >> the available components may include non-character fragments. Some
    >> fragments may be useful for IME purposes, but probably not all.
    >> On Sat, May 1, 2010 at 8:57 PM, Edward Cherlin <
    >> <>> wrote:
    >> 2010/4/28 John H. Jenkins <
    >> <>>:
    >> > No. You could certainly write up a proposal and submit it
    >> to the UTC.
    >> > Should the UTC feel the idea has merit, it would then move
    >> it on to WG2
    >> > and/or the IRG.
    >> > The main problem here is that there is a very strong desire
    >> to limit
    >> > ideograph encoding to attested and documentable forms.
    >> Anything which does
    >> > not exist in actual texts is not likely to be well-regarded.
    >> I had the idea some years ago of writing up a proposal to
    >> encode the
    >> hanzi fragments used in Cangjie Shurufa IMEs. These fragments
    >> are used
    >> extensively in dozens of howto books on keyboarding in
    >> Cangjie. This
    >> includes the pieces (mostly real characters, with some
    >> radicals) used
    >> on keyboard labels, and the common forms they stand for. I
    >> didn't get
    >> any interest from the Cangjie development community or the
    >> authors of
    >> a book on Cangjie that I have, so i abandoned the idea.
    >> Uriah, is this the sort of thing you have in mind?
    >> > Similarly, the
    >> > UTC has a strong preference not to encoding anything which
    >> isn't in actual
    >> > use. Proposals to encode characters because they would be
    >> useful if encoded
    >> > even though they aren't actually being used right now are
    >> generally looked
    >> > on with disfavor.
    >> >
    >> > 在 Apr 28, 2010 12:03 PM 時, Uriah Eisenstein 寫到:
    >> >
    >> > Hello,
    >> > My question is about common components of CJK Ideographs
    >> which are not
    >> > encoded as independent Han characters (and perhaps indeed
    >> aren't). A good
    >> > example is the right-hand part of the character 漢 itself:
    >> it is a distinct
    >> > component appearing in multiple other characters, but is not
    >> encoded to the
    >> > best of my knowledge. The same goes for the top part of 鳥
    >> and 島, the
    >> > surrounding part of 與 and 興 and several others. My
    >> question is whether there
    >> > are any plans or discussions for encoding these fragments in
    >> Unicode.
    >> >
    >> > (I haven't found anything about this in mailing list
    >> archives; I did find
    >> > statements that Unicode does not intend to provide any
    >> decomposition data of
    >> > Han characters :) And for good reasons. However, such
    >> fragments may well be
    >> > useful for third-party software dealing with 漢字 glyph
    >> generation, lookup by
    >> > components etc.)
    >> >
    >> > Thanks,
    >> > Uriah Eisenstein
    >> >
    >> >
    >> --
    >> Edward Mokurai (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) Cherlin
    >> Silent Thunder is my name, and Children are my nation.
    >> The Cosmos is my dwelling place, the Truth my destination.

    This archive was generated by hypermail 2.1.5 : Mon May 10 2010 - 07:35:36 CDT