Re: Cuneiform Free Variation Selectors

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jan 20 2004 - 15:35:56 EST

Next message: jcowan@reutershealth.com: "Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)"

Previous message: Peter Constable: "recent meeting of ISO 639/RA JAC"
Maybe in reply to: Michael Everson: "Cuneiform Free Variation Selectors"
Next in thread: Kenneth Whistler: "RE: Cuneiform Free Variation Selectors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dean Snyder continued:

> >> But NO ONE mentioned free variation selectors in the discussion until
> >> yesterday.
> >
> >This is not the case. *I* mentioned free variation selectors
> >during both of the ICE meetings. They weren't discussed at any
> >great length, precisely because I and the other encoding experts
> >did not feel that they were applicable to the basic encoding issues
> >of Cuneiform.
>
> Sorry I missed your mention of them at the ICE conferences.
>
> But I was referring to their not being mentioned in these fairly
> extensive email discussions on dynamic cuneiform over the last month.

Actually, those discussions were primarily on the cuneiform list,
where they belong, since the people with sufficient knowledge of
the cuneiform sign issues relevant to the discussion are participating
there.

But you are correct that variation selectors have not been brought
up recently in the context of "dynamic cuneiform", for a very good
reason -- they are basically irrelevant to the discussion.

> >They may have a place in some future refinement of Cuneiform, but
> >only for representation of notable variants of the *statically*
> >encoded list of base signs, *not* for the kind of dynamic sign
> >building that you have been advocating.
>
> I don't want to burden your time, but I do not understand the technical
> resistance to this.

The technical resistance comes from the fact that they are irrelevant
to what Dean is attempting to do in cuneiform.

A variation selector is appropriate for a certain limited set of
contexts where there is a plain text requirement to choose a
particular variant glyph to represent a character, but where the
semantic intent is the same. (Look at StandardizedVariants.html
on the website for specific instances of standardized variants for
a few mathematical symbols.)

> I know there are implementation complexities, time to
> market issues, costs, etc. And these are indeed real considerations. But
> I do not see the TECHNICAL reasons against it,

The TECHNICAL reasons have been pointed out.

Perhaps an analogy would be appropriate. Use of Unicode variation
selector characters (FE00..FE0F, E0100..E01EF) to construct
Cuneiform signs dynamically from base sign parts would be a little
bit like using a paperclips instead of screws to assemble furniture.
It is the wrong "fastener" type, applied to the wrong materials.

> especially when it is
> already being used for somewhat similar purposes.
                         ^^^^^^^^^^^^^^^^
                         completely dissimilar

There are many, many, different types of juxtaposition and
composition occurring in Unicode. It is a mistake to equate them
all and claim them to be "somewhat similar" because they may
involve variations in form and compositions of more than one
character.

Type I: Combining Character Sequences

These consist of a base character followed by one or more combining
marks. The marks are *inherently* combining, as defined by the
standard. They apply graphically to the base, and the end result
can either be dynamically generated by an appropriate font, or
can be mapped (in a font) to a fully-formed glyph.

Note: no "operators" are involved.

Type II: Compatibility Equivalences

There are numerous instances where the standard indicates that
some character is approproximately equivalent to another
sequence of characters. These are largely the result of grandfathered
decisions from other character encodings. An example can be seen
in parenthesized numbers and letters (e.g. U+2474) used in East
Asian typography, where U+2474 "(1)" is approximately the same
as the sequence of "(" + "1" + ")".

Note: no "operators" are involved. This is just a claim of
approximate equivalence for the purpose of interpretation and/or
comparisons.

Type IIIa: Ligation

These consist of two (or more) characters in juxtaposition, which
may take special ligated forms in rendering. Ordinarily control
of ligation is a matter of fonts and higher-level protocols, but
controls also exist in Unicode (ZWJ/ZWNJ) which can sit in the
plain text and serve as a hint to the rendering system regarding
ligature formation.

Type IIIb: Cursive connection

This is a kind of ligation formalized for control of cursive
connection in scripts with standardized typography in cursive form --
most notably Arabic and Syriac. The ZWJ/ZWNJ format controls are
used to enable the exhibition of particular cursive forms outside
their normal rendering context.

Type IV: Variation selection

This is a mechanism for picking out a particular glyph (from among
a predefined set of "standardized variants") in a plain text context
where a particular distinction is required. This mechanism is
used only in limited contexts, and is primarily to avoid having to
encode large numbers of glyph variants as characters per se.
(Which is why Michael Everson calls this mechanism "pseudo-encoding".)

The variation selectors: U+FE00..U+FE0F, U+E0100..U+E01EF have been
encoded, in order to have a large enough number of potential
variation selector distinctions for any single character to deal
with the worst case scenarioes cited for Han. More typical is the
use of just one to three variation selectors to differentiate among
the most common glyphs. (And note that such usage is *only* conformant
when a selection is made from StandardizedVariants.txt. They cannot
be generatively applied to deal with any old glyphic variation.)

Type V: Ideographic Character Description

A set of ideographic character description characters are defined to
enable *approximate* descriptions of unencoded Han characters.
See U+2FF0..U+2FFB. These are just symbols, although their usage
is pseudo-operator-like, and there is a defined syntax for their
juxtaposition. They can be used *only* with unified Han ideographic
characters and with radical symbols. They do not actually *construct*
a character, but rather describe it approximately. There is no
requirement for a conformant Unicode renderer to actually attempt
a rendering of the Han character so described.

Type VI: Glyph Description Language

There are many schemes, particularly for Han characters, to define
dynamic glyph description languages. These are circumscribed syntaxes,
involving basic stroke types, components, and juxtaposition
operators of various types, along with some kind of coordinate
system. One of the most successful current instances of such is
the Wenlin Character Description Language (CDL), which has been
successfully used to create a database of glyph descriptions for
the vast majority of all of the Han characters in Unicode.

A GDL always contains some collection of *operators*, which are
used to describe the *graphic* relation between the operand
parts.

A GDL may be a useful adjunct to a character encoding standard, as it
helps in establishing glyph identity and may be relevant to font
design. It is, however, out of scope for the character encoding
itself, which is encoding characters, rather than prescribing
glyph shapes for those characters.

Note that what Dean Snyder has proposed for "dynamic cuneiform" falls
into Type VI. It is a rough framework for a glyph description
language for cuneiform glyphs. The set of 14 "ligators" in it
are actually almost all to be conceived of as operands of a
glyph description language. (e.g. "invert glyph", "rotate glyph
90 degress", and so on) As such, it is really, really out of
scope for the *character* encoding of cuneiform.

Alternative approaches to cuneiform that treat some of the
composition of cuneiform signs as the result of application
of combining marks to base characters *have* been discussed,
but those would fall under Type I above. They are essentially
alternatives to simply encoding the full list of cuneiform
signs as "precomposed" signs, even when some generativity in
combinations is manifest for them. But this kind of approach,
which involves no *operators* of any sort, is completely
at odds with the "dynamic cuneiform" that Dean has been
advocating, with its list of 14 operators to construct glyphs.

Neither of those approaches has anything to do with variation
selectors -- which is why the reaction of this list to this
latest suggestion has been a collective head-scratching.

>
> An aside:
>
> How does Hangul jamo relate to all of this? From a quick reading of
> chapter 11.4 of The Unicode Standard it sounds similar to what I am
> thinking about dynamic cuneiform.

It is a variation of Type I above. There are no *operators* and
no glyph description language involved.

--Ken

Next message: jcowan@reutershealth.com: "Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)"
Previous message: Peter Constable: "recent meeting of ISO 639/RA JAC"
Maybe in reply to: Michael Everson: "Cuneiform Free Variation Selectors"
Next in thread: Kenneth Whistler: "RE: Cuneiform Free Variation Selectors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 17:21:17 EST