Scalability of ScriptExtensions (was: RE: Borrowed Thai Punctuation in Tai Tham Text)

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Mon, 8 Jul 2013 20:35:05 +0000

Richard Wordingham asked:

> How many examples do I need to collect to add Tai Tham to the script
> extensions property for ... ?

IMO, a couple well-documented examples ought to suffice.

But, this query raises a couple further questions for me regarding
the scalability and maintenance of ScriptExtensions.txt. Basically,
reports coming in of "Script X character Y is also used with Script Z"
are proving to be a rather haphazard and ad hoc way of maintaining
that data file and the related property. It seems as if additions
to the data file are motivated more by who is paying attention to
what this month, rather than by any overall measures of objective
validity or implementation usefulness of the property. I'm not sure
what alternative there is now, but find it very distasteful that the UTC
has been forced into the mode of property maintenance for such
a subjective and haphazard collection of observations about
common usage.

The second question is this: what likelihood is there that a full
implementation of Tai Tham will not also be expected to be
capable of handling all of Thai? In such a case, aren't a series
of ad hoc observations about common use of punctuation between
the scripts somewhat superfluous?

I ask that because the situation echoes the rather more extensive
situation of East Asian punctuation usage for ideographic or
syllabic scripts typeset together with Chinese. Trying to track
all of those instances down and getting them all enshrined in
ScriptExtensions.txt strikes me as a losing proposition already --
and the situation is likely to just get worse as more historic
scripts from East Asia end up in Unicode eventually.

A much more productive approach, it seems to me, would be instead to
try to establish information about various, identifiable typographical
traditions for use of punctuation around the world, and then associate
"exemplar sets" of punctuation used with those traditions. Such an
approach, I assert, would tend to be much more robust (as well
as more comprehensible) than definition of very fragile set
definitions associating lists of scripts one-by-one with various
characters.

--Ken
Received on Mon Jul 08 2013 - 15:39:48 CDT

This archive was generated by hypermail 2.2.0 : Mon Jul 08 2013 - 15:39:49 CDT