Why wasn't it possible to encode a coeng-like joiner for Tibetan? from Shriramana Sharma on 2013-04-10 (Unicode Mail List Archive)

From: Shriramana Sharma <samjnaa_at_gmail.com>
Date: Thu, 11 Apr 2013 04:39:45 +0530

Hello people. This is just of academic interest, since the fact is
that a full series of subjoined characters *have* been encoded and
*are* being used for Tibetan, and nothing is going to change that, but
it could have an effect on future proposals for Tibetan-like scripts,
so I think it is important for this matter to be discussed.

The standard says that "there were two main reasons for this choice"
of choosing to encode separate subjoined characters for Tibetan rather
than using an Indic-like virama model:

The *second* reason provided is that due to the prevalence of stacking
in Tibetan, encoding subjoined characters would cause decreased
storage requirements. Well that's true for any South Indic script --
Telugu, Kannada, Grantha -- which also regularly uses stacks for
representing clusters, so this is not something that is unique to
Tibetan.

The *first* reason stated is that "the virama is not normally used in
the Tibetan writing system to create letter combinations". But this
sentence conflates two things, the visible device of a vowel-killer
virama as part of the attested orthography, and the abstract encoded
character as part of digital text. Clearly the "is not normally used"
can refer only to the former, not the latter.

OK fine, so in practice the virama *with a visible form* is never used
in writing Tibetan. But even for Devanagari, if it were not for
Sanskrit, a visible virama is almost never used for Hindi, the
prevalent language, and it is only that Devanagari is also heavily
used for Sanskrit and the thing about maintaining uniformity with
other Indic scripts that the visible function and the joining function
were united in a single character.

So it's not a big deal to separate the two functions, as is done in
Khmer etc. Hypothetically even in mainland Indic we could have
separate joiner-virama vs visible-virama characters.

So my point is that even though the visible virama is not used in
Tibetan (probably because the TSHEG separates syllables making the
final consonant vowelless) one could very well have gone the Khmer way
and made a separate character for that (as indeed has been done) but
still have had a single joiner for causing the stacks.

Or was the Khmer model of an invisible joiner a *later* bright idea?
But really that doesn't hold water (I mean the "later" part) because
the Indic virama model already existed, and whether or not Tibetan
used the visible virama heavily need not have prevented from a virama
character, which would have a visible form in appropriate contexts,
causing stacking in other contexts.

And even that thing about the contrast between the full-form subjoined
consonants YA RA VA and half-form ones (I mean the -tags forms) need
not prevent this, because you could encode a virama and have the
*regular* (-tags) forms produced by it, and use separately encoded
subjoined characters for the aberrant forms alone.

As for the RA-MGO thing, I still am not sure how it is advisable to
have a 0F6A glyphically identical to 0F62 and even if a
default-ignorable ZWNJ would not have been satisfactory, some
specialized non-default-ignorable conjoining-form-prevention character
could be defined, which would then also be used for subjoined
full-form YA RA VA avoiding those extra characters too.

Or have it as you wish and encode 0F6A and 0FBA-0FBC to avoid such
specialized character stuff, but still for the rest of the consonants
including the prevalent -tags forms of YA RA VA, the justification
provided for a full series of atomic subjoining characters seems quite
insufficient...

--
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

Received on Wed Apr 10 2013 - 18:16:06 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 10 2013 - 18:16:07 CDT