RE: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

From: Mike Ayers (
Date: Tue Jan 20 2004 - 13:04:59 EST

  • Next message: Francois Yergeau: "RE: Unicode forms for internal storage"

    > 1) U+9CE6 is a traditional Chinese character (a kind of swallow)
    > without a SC counterpart encoded. However, applying the usual rules
    > for simplifications, it would be easy to derive a simplified
    > form which
    > one could conceivably see in a book printed in the PRC. Rather than
    > encode the simplified form, the UTC would prefer to represent the SC
    > form using U+9CE6 + a variation selector.

            Ummm - if this simplified form were used at all, wouldn't it already
    be encoded? Isn't there a process for getting such encoded? Has this
    process broken down, or have some of its assumptions been shown invalid?

    > 2) Your best friend has the last name of "turtle," but he doesn't use
    > any of the encoded forms for the turtle character to
    > represent it. He
    > insists on writing it in yet another way and wants to be able to
    > include his name as he writes it in the source code he edits.
    > The UTC
    > ends up accommodating him using U+2A6C9 (which is the closest
    > turtle to
    > his last name) + a variation selector.

            Huh? You forgot the part about "the font designer psychically
    already knew how Mr. Turtle draws his name and encoded the glyph for it,
    even though he had no reason to know that it would ever be used" part of the
    sequence, if this is to work. I don't mean to be harsh (and I know that I
    probably am anyway), but this sounds more than a bit like a magic wand to
    wipe away all the free variants that occur in Chinese usage. Are you saying
    that there is a known limit to the number of character variants, and that
    there is an establishable correspondence between these variants such that a
    logical connection between a variant and one of a set of FSV is possible?
    Call me skeptical...

    > 3) You're editing a critical edition of an ancient MS, and you find
    > that your author, who talks a lot about handkerchiefs, uses U+5E28
    > quite a bit, but varies between the "ears-in" form and the "ears-out"
    > form almost at random. Rather than lose the distinction
    > which *may* be
    > meaningful, you (with the UTC's blessing) use U+5E28 for the ears-in
    > form (as Unicode uses) and U+5E28 + a variation selector for the
    > ears-out form.

            Whoa, Nellie!

            Did "represent newly discovered characters" creep into the mission
    statement of plain text when I wasn't looking?

            Should I be hitting the archives? I've been gone awhile, and I
    don't want to retread this if its already been treaded.


    This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 13:52:33 EST