Re: Level of Unicode support required for various languages

From: John H. Jenkins (jenkins@apple.com)
Date: Fri Oct 26 2007 - 10:54:55 CDT

  • Next message: John H. Jenkins: "Re: Level of Unicode support required for various languages"

    On Oct 25, 2007, at 11:20 PM, James Kass wrote:

    > So, if a rare character has uncertain provenance and meaning, but
    > it is unifiable, shouldn't it just be unified?

    Ideally, yes. The problem is a reluctance to do a unification where
    you don't *know* that it's acceptable to the author of the original
    text.

    A good case in point came up when South Korea proposed a large set of
    characters to be able to encode the Korean tripitaka. It includes a
    large number of characters which were *probably* just variants of
    other characters but which *may* have been intended to be distinct
    characters. In the end, South Korea was convinced that the case for
    encoding them as separate characters was weak and withdrew them from
    consideration.

    > And, if that character
    > is not unifiable, but it exists in texts (however obscure) that
    > someone may wish to reproduce electronically (for posterity,
    > perhaps), shouldn't it be encoded?
    >

    It should be representable, yes. But that representation need not
    take the form of a distinct encoding.
    >
    > Is it really possible to speed up the process of encoding an
    > open-ended set?
    >

    Yes, the IRG has taken some steps to do so. Extension D submissions
    required IDSs so that a preliminary unification can be done by
    computer, TrueType fonts for a better ability to see what the
    characters look like, and better provenance information so that we can
    have a better sense of whether or not the characters *ought* to be
    unification.

    Moreover, I have on my plate an action item to produce a second set of
    variant glyphs. Variant glyphs have a number of advantages for many
    of the beasties proposed for encoding. Registering variant glyphs is
    a faster process, for one thing, than encoding distinct characters.
    It also makes text analysis simpler and -- most importantly -- it
    takes these entities off the IRGs plate so it can focus on what
    actually *does* need to be encoding.

    =====
    John H. Jenkins
    jenkins@apple.com



    This archive was generated by hypermail 2.1.5 : Fri Oct 26 2007 - 10:56:28 CDT