Re: Cuneiform Free Variation Selectors

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jan 20 2004 - 15:35:56 EST

  • Next message: jcowan@reutershealth.com: "Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)"

    Dean Snyder continued:

    > >> But NO ONE mentioned free variation selectors in the discussion until
    > >> yesterday.
    > >
    > >This is not the case. *I* mentioned free variation selectors
    > >during both of the ICE meetings. They weren't discussed at any
    > >great length, precisely because I and the other encoding experts
    > >did not feel that they were applicable to the basic encoding issues
    > >of Cuneiform.
    >
    > Sorry I missed your mention of them at the ICE conferences.
    >
    > But I was referring to their not being mentioned in these fairly
    > extensive email discussions on dynamic cuneiform over the last month.

    Actually, those discussions were primarily on the cuneiform list,
    where they belong, since the people with sufficient knowledge of
    the cuneiform sign issues relevant to the discussion are participating
    there.

    But you are correct that variation selectors have not been brought
    up recently in the context of "dynamic cuneiform", for a very good
    reason -- they are basically irrelevant to the discussion.

    > >They may have a place in some future refinement of Cuneiform, but
    > >only for representation of notable variants of the *statically*
    > >encoded list of base signs, *not* for the kind of dynamic sign
    > >building that you have been advocating.
    >
    > I don't want to burden your time, but I do not understand the technical
    > resistance to this.

    The technical resistance comes from the fact that they are irrelevant
    to what Dean is attempting to do in cuneiform.

    A variation selector is appropriate for a certain limited set of
    contexts where there is a plain text requirement to choose a
    particular variant glyph to represent a character, but where the
    semantic intent is the same. (Look at StandardizedVariants.html
    on the website for specific instances of standardized variants for
    a few mathematical symbols.)

    > I know there are implementation complexities, time to
    > market issues, costs, etc. And these are indeed real considerations. But
    > I do not see the TECHNICAL reasons against it,

    The TECHNICAL reasons have been pointed out.

    Perhaps an analogy would be appropriate. Use of Unicode variation
    selector characters (FE00..FE0F, E0100..E01EF) to construct
    Cuneiform signs dynamically from base sign parts would be a little
    bit like using a paperclips instead of screws to assemble furniture.
    It is the wrong "fastener" type, applied to the wrong materials.

    > especially when it is
    > already being used for somewhat similar purposes.
                             ^^^^^^^^^^^^^^^^
                             completely dissimilar
                             
    There are many, many, different types of juxtaposition and
    composition occurring in Unicode. It is a mistake to equate them
    all and claim them to be "somewhat similar" because they may
    involve variations in form and compositions of more than one
    character.

    Type I: Combining Character Sequences

    These consist of a base character followed by one or more combining
    marks. The marks are *inherently* combining, as defined by the
    standard. They apply graphically to the base, and the end result
    can either be dynamically generated by an appropriate font, or
    can be mapped (in a font) to a fully-formed glyph.

    Note: no "operators" are involved.

    Type II: Compatibility Equivalences

    There are numerous instances where the standard indicates that
    some character is approproximately equivalent to another
    sequence of characters. These are largely the result of grandfathered
    decisions from other character encodings. An example can be seen
    in parenthesized numbers and letters (e.g. U+2474) used in East
    Asian typography, where U+2474 "(1)" is approximately the same
    as the sequence of "(" + "1" + ")".

    Note: no "operators" are involved. This is just a claim of
    approximate equivalence for the purpose of interpretation and/or
    comparisons.

    Type IIIa: Ligation

    These consist of two (or more) characters in juxtaposition, which
    may take special ligated forms in rendering. Ordinarily control
    of ligation is a matter of fonts and higher-level protocols, but
    controls also exist in Unicode (ZWJ/ZWNJ) which can sit in the
    plain text and serve as a hint to the rendering system regarding
    ligature formation.

    Type IIIb: Cursive connection

    This is a kind of ligation formalized for control of cursive
    connection in scripts with standardized typography in cursive form --
    most notably Arabic and Syriac. The ZWJ/ZWNJ format controls are
    used to enable the exhibition of particular cursive forms outside
    their normal rendering context.

    Type IV: Variation selection

    This is a mechanism for picking out a particular glyph (from among
    a predefined set of "standardized variants") in a plain text context
    where a particular distinction is required. This mechanism is
    used only in limited contexts, and is primarily to avoid having to
    encode large numbers of glyph variants as characters per se.
    (Which is why Michael Everson calls this mechanism "pseudo-encoding".)

    The variation selectors: U+FE00..U+FE0F, U+E0100..U+E01EF have been
    encoded, in order to have a large enough number of potential
    variation selector distinctions for any single character to deal
    with the worst case scenarioes cited for Han. More typical is the
    use of just one to three variation selectors to differentiate among
    the most common glyphs. (And note that such usage is *only* conformant
    when a selection is made from StandardizedVariants.txt. They cannot
    be generatively applied to deal with any old glyphic variation.)

    Type V: Ideographic Character Description

    A set of ideographic character description characters are defined to
    enable *approximate* descriptions of unencoded Han characters.
    See U+2FF0..U+2FFB. These are just symbols, although their usage
    is pseudo-operator-like, and there is a defined syntax for their
    juxtaposition. They can be used *only* with unified Han ideographic
    characters and with radical symbols. They do not actually *construct*
    a character, but rather describe it approximately. There is no
    requirement for a conformant Unicode renderer to actually attempt
    a rendering of the Han character so described.

    Type VI: Glyph Description Language

    There are many schemes, particularly for Han characters, to define
    dynamic glyph description languages. These are circumscribed syntaxes,
    involving basic stroke types, components, and juxtaposition
    operators of various types, along with some kind of coordinate
    system. One of the most successful current instances of such is
    the Wenlin Character Description Language (CDL), which has been
    successfully used to create a database of glyph descriptions for
    the vast majority of all of the Han characters in Unicode.

    A GDL always contains some collection of *operators*, which are
    used to describe the *graphic* relation between the operand
    parts.

    A GDL may be a useful adjunct to a character encoding standard, as it
    helps in establishing glyph identity and may be relevant to font
    design. It is, however, out of scope for the character encoding
    itself, which is encoding characters, rather than prescribing
    glyph shapes for those characters.

    Note that what Dean Snyder has proposed for "dynamic cuneiform" falls
    into Type VI. It is a rough framework for a glyph description
    language for cuneiform glyphs. The set of 14 "ligators" in it
    are actually almost all to be conceived of as operands of a
    glyph description language. (e.g. "invert glyph", "rotate glyph
    90 degress", and so on) As such, it is really, really out of
    scope for the *character* encoding of cuneiform.

    Alternative approaches to cuneiform that treat some of the
    composition of cuneiform signs as the result of application
    of combining marks to base characters *have* been discussed,
    but those would fall under Type I above. They are essentially
    alternatives to simply encoding the full list of cuneiform
    signs as "precomposed" signs, even when some generativity in
    combinations is manifest for them. But this kind of approach,
    which involves no *operators* of any sort, is completely
    at odds with the "dynamic cuneiform" that Dean has been
    advocating, with its list of 14 operators to construct glyphs.

    Neither of those approaches has anything to do with variation
    selectors -- which is why the reaction of this list to this
    latest suggestion has been a collective head-scratching.
                   
    >
    > An aside:
    >
    > How does Hangul jamo relate to all of this? From a quick reading of
    > chapter 11.4 of The Unicode Standard it sounds similar to what I am
    > thinking about dynamic cuneiform.

    It is a variation of Type I above. There are no *operators* and
    no glyph description language involved.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 17:21:17 EST