L2/02-270


Date: 2002-07-26 17:33:10 -0700
From: Kenneth Whistler <kenw@sybase.com>

Title: Character Properties (Re: L2/02-267R)

Mark, et al.

First some nits: 

L2/02-267R is identified on the document itself
as L2/01-267R. That should be fixed to avoid confusions in future
      ^^
references.

Also, the numbering of sections within this document is very confusing.
The table of contents area has section 1-7, but those numbers aren't
reflected into the sections themselves, and the sections then have
either ABC lettering, with subnumbering 123 (for section 1) or 123 numbering
(e.g. for section 6). For a document like this, for which the UTC is
going to have to make a bunch of separate determinations for each
section (and record them in minutes), you should go to the effort to
have each subpart given a clear identification *in the text* and not
just depend on inconsistent autonumberings.

Now a substantive comment:

The ongoing discussion about "Linear Tamil" just made something
clear to me about Section 6.1 of your proposals (New Properties in
PropList.txt for split, reordrant, and subjoined combining marks).

One of the reasons why we have not made these *character* properties
before is that in sooth they are all *glyph* properties, rather
than character properties. The glyph properties are certainly relevant
to rendering and to font design, but these particular ones are only
a small part of the kinds of glyph properties we could in principle
start defining. The significant point here is that all of these
characters are combining marks of combining class 0 -- that is what
impacts the normalization algorithm and anything else involving
decomposition.

Why the Linear Tamil discussion brought this home to me is because
Sinnathurai Srivas' suggestion is essentially to introduce a new
Tamil rendering system by substituting out a number of vowel and
ligature glyphs with new ones which have distinct *glyph* properties --
and some of them are precisely these split and reordrant glyphs
that L2/02-267R is proposing be added as *character* properties.

Now whatever the merits of Linear Tamil otherwise, the fact remains
that it is an innovative suggestion which takes advantage of the
fact that the Unicode Standard does not normatively define glyph
properties -- only character properties. We would be venturing into
new territory here if we started claiming that U+0BCA *must* have
a split glyph for display; it would put us in an encoding pickle
if a script reform were introduced which would otherwise be compatible
with Unicode text encoding for Tamil but which didn't use a split
glyph. It is just a more dramatic example of why we don't want
our chart glyphs to be taken as prescriptive in nature -- once we
do so, we end up inviting the world to come to our doorsteps asking
for every *other* glyph to be encoded as a distinct character.

Thus I find the housekeeping urge behind the Section 6.1 proposal
to be insufficiently convincing. It would put us in the position of
being able to obsolete the printed table in Section 4.2 of TUS, but
actually at the cost of reifying some glyphic properties as
character properties in ways that could establish dangerous precedents
that could come back to bite us.

In other words, I now find myself disagreeing strongly with the
claim: "If those properties are indeed important, they should be
reflected in UCD properties." I think this begs the question of
what *kind* of properties they are and whether, if they turn out
to be glyph properties, as I surmise, they should be reflected in
the UCD or in something else.

To the contrary, I find myself now thinking that instead we need to
beef up the explanation related to Section 4.2 to point out the
difference between glyph properties and the combining class
assignments.

--Ken