Re: Scalability of ScriptExtensions (was: RE: Borrowed Thai Punctuation in Tai Tham Text)

From: Asmus Freytag <>
Date: Mon, 08 Jul 2013 14:42:15 -0700

On 7/8/2013 1:35 PM, Whistler, Ken wrote:
> A much more productive approach, it seems to me, would be instead to
> try to establish information about various, identifiable typographical
> traditions for use of punctuation around the world, and then associate
> "exemplar sets" of punctuation used with those traditions.

I would recommend that an approach like that be used "behind the scenes"
to manage the update of the data file.

We are stuck with a format that seemingly assumes that all characters
are treated individually. However, I agree with you, that this is not
the case, but instead, there are these sets of punctuation marks for
certain "typographical traditions".

In addition, there are issues like the Dandas, where specific marks have
been unified across a range of related scripts.

A flexible way to pull this information together would be a UTN that
tries to collect this information in human, not machine readable form,
with commentary and background.

If the information in the UTN is considered solid, then it could be
reflected, in a separate pass, in the existing property file. Because
you would work on the basis of either typographical sets (or explicit
encoding decisions) there would be less temptation to jiggle individual
characters' property values.

