Variation Sequences as Substitute for Fonts or for Encoding a Script (Was... Phoenician ...)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 20 2004 - 19:05:09 CDT

  • Next message: Philippe Verdy: "Re: ISO 15924 draft fixes"

    Ernest indicated:

    > Whether using variation sequences to separate
    > Phoenician from Square Hebrew would be daft
    > would depend upon a number of factors.
    >
    > How often would both glyph repertoires appear in
    > the same document?
    >
    > How frequently would non-Square Hebrew glyphs
    > be used?
    >
    > How important is it to any particular body of users
    > to emphasize the relationship of the different
    > repertoires by using the same base characters?
    >
    > How large would that body of users be compared
    > to other users who do not need such an emphasis?
    >
    > I don't know the answers to the above questions.

    Actually, I think the answers to those questions are
    irrelevant.

    > I see those answers as determining whether
    > non-unification or unification supplemented with
    > variation sequences would be the better choice.

    The main reason why such a proposal is daft is because
    the UTC has never had any intention that variation sequences
    be used this way -- and as a result would never acquiesce
    in encoding an entire *script* as a set of variation
    sequences off another script.

    The options are, as John indicated:

    a. Assume one script, and render differences via fonts
       (mapped to the same code points).

    b. Assume two scripts, encode distinctly, and render
       differences via fonts (mapped to different code points).
       
    Variation sequences are used to indicate variant glyphs for
    particular characters within a script (or set of symbols) --
    not as a hack for avoiding the encoding of a script entire
    or for avoiding the need for font tagging to make visual
    distinctions in a writing system.

    Variation sequences are also a last resort, used only in
    instances where a distinct character encoding approach
    smells too much of duplication of otherwise identical
    "characters" which just happen to have some particular
    formal distinction that is needed for rendering, roundtrip
    mapping, etc.

    Of course you can (and have) argued that you can simply apply
    that logic to *every* character of a script. But I can
    assure you that there is *no* constiuency for approaching
    decisions about encoding an entire script that way in
    either the UTC or in WG2.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu May 20 2004 - 19:06:03 CDT