L2/02-270 Date: 2002-07-26 17:33:10 -0700 From: Kenneth Whistler Title: Character Properties (Re: L2/02-267R) Mark, et al. First some nits: L2/02-267R is identified on the document itself as L2/01-267R. That should be fixed to avoid confusions in future ^^ references. Also, the numbering of sections within this document is very confusing. The table of contents area has section 1-7, but those numbers aren't reflected into the sections themselves, and the sections then have either ABC lettering, with subnumbering 123 (for section 1) or 123 numbering (e.g. for section 6). For a document like this, for which the UTC is going to have to make a bunch of separate determinations for each section (and record them in minutes), you should go to the effort to have each subpart given a clear identification *in the text* and not just depend on inconsistent autonumberings. Now a substantive comment: The ongoing discussion about "Linear Tamil" just made something clear to me about Section 6.1 of your proposals (New Properties in PropList.txt for split, reordrant, and subjoined combining marks). One of the reasons why we have not made these *character* properties before is that in sooth they are all *glyph* properties, rather than character properties. The glyph properties are certainly relevant to rendering and to font design, but these particular ones are only a small part of the kinds of glyph properties we could in principle start defining. The significant point here is that all of these characters are combining marks of combining class 0 -- that is what impacts the normalization algorithm and anything else involving decomposition. Why the Linear Tamil discussion brought this home to me is because Sinnathurai Srivas' suggestion is essentially to introduce a new Tamil rendering system by substituting out a number of vowel and ligature glyphs with new ones which have distinct *glyph* properties -- and some of them are precisely these split and reordrant glyphs that L2/02-267R is proposing be added as *character* properties. Now whatever the merits of Linear Tamil otherwise, the fact remains that it is an innovative suggestion which takes advantage of the fact that the Unicode Standard does not normatively define glyph properties -- only character properties. We would be venturing into new territory here if we started claiming that U+0BCA *must* have a split glyph for display; it would put us in an encoding pickle if a script reform were introduced which would otherwise be compatible with Unicode text encoding for Tamil but which didn't use a split glyph. It is just a more dramatic example of why we don't want our chart glyphs to be taken as prescriptive in nature -- once we do so, we end up inviting the world to come to our doorsteps asking for every *other* glyph to be encoded as a distinct character. Thus I find the housekeeping urge behind the Section 6.1 proposal to be insufficiently convincing. It would put us in the position of being able to obsolete the printed table in Section 4.2 of TUS, but actually at the cost of reifying some glyphic properties as character properties in ways that could establish dangerous precedents that could come back to bite us. In other words, I now find myself disagreeing strongly with the claim: "If those properties are indeed important, they should be reflected in UCD properties." I think this begs the question of what *kind* of properties they are and whether, if they turn out to be glyph properties, as I surmise, they should be reflected in the UCD or in something else. To the contrary, I find myself now thinking that instead we need to beef up the explanation related to Section 4.2 to point out the difference between glyph properties and the combining class assignments. --Ken