Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan? from Christopher Fynn on 2013-04-10 (Unicode Mail List Archive)

From: Christopher Fynn <chris.fynn_at_gmail.com>
Date: Thu, 11 Apr 2013 10:24:11 +0600

On 11/04/2013, Shriramana Sharma <samjnaa_at_gmail.com> wrote:

> Hello people. This is just of academic interest, since the fact is
> that a full series of subjoined characters *have* been encoded and
> *are* being used for Tibetan, and nothing is going to change that, but
> it could have an effect on future proposals for Tibetan-like scripts,
> so I think it is important for this matter to be discussed.

The way Indic is encoded is an inheritance of ISCII - which encoded
Indic scripts in a limited 8-bit code space alongside ASCII.

There was no similar pre-existing standard for Tibetan, the encoding
model used is based on the way Tibetans are taught to spell.

In Unicode v1 Tibetan was encoded on the Indic model - but in practice
there were problems found with this and Tibetan was removed and later
re-encoded.

> The standard says that "there were two main reasons for this choice"
> of choosing to encode separate subjoined characters for Tibetan rather
> than using an Indic-like virama model:
>
> The *second* reason provided is that due to the prevalence of stacking
> in Tibetan, encoding subjoined characters would cause decreased
> storage requirements. Well that's true for any South Indic script --
> Telugu, Kannada, Grantha -- which also regularly uses stacks for
> representing clusters, so this is not something that is unique to
> Tibetan.
>
> The *first* reason stated is that "the virama is not normally used in
> the Tibetan writing system to create letter combinations". But this
> sentence conflates two things, the visible device of a vowel-killer
> virama as part of the attested orthography, and the abstract encoded
> character as part of digital text. Clearly the "is not normally used"
> can refer only to the former, not the latter.

Unlike Indian languages, there are a lot of unvoiced (silent)
consonants (prefixes and some suffixes) in Tibetan. Other letters are
pronounced with no vowel sound. Both these things are dependent on the
position of the consonant with in the syllable (marked by)

> OK fine, so in practice the virama *with a visible form* is never used
> in writing Tibetan.

It is never used when writing the Tibetan language, but it is
sometimes used when writing Sanskrit in Tibetan (but nowhere near as
much as when writing Sanskrit in Devanagri)

> But even for Devanagari, if it were not for
> Sanskrit, a visible virama is almost never used for Hindi, the
> prevalent language, and it is only that Devanagari is also heavily
> used for Sanskrit and the thing about maintaining uniformity with
> other Indic scripts that the visible function and the joining function
> were united in a single character.

But afaik in Hindi etc. it is legal to use a visible virama instead of
joining letters. In Tibetan this is not so (except when writing
Sanskrit)

> So it's not a big deal to separate the two functions, as is done in
> Khmer etc. Hypothetically even in mainland Indic we could have
> separate joiner-virama vs visible-virama characters.
>
> So my point is that even though the visible virama is not used in
> Tibetan (probably because the TSHEG separates syllables making the
> final consonant vowelless) one could very well have gone the Khmer way
> and made a separate character for that (as indeed has been done) but
> still have had a single joiner for causing the stacks.

In Tibetan that not only the final consonant may be vowel-less

There was a proposal to encode Tibetan with an explicit STACK
(invisible-joiner) character - but eventually the model adopted was
preferred.

China wanted to encode every combination of Tibetan characters - which
would have meant 6,000+ characters. (They do have an official national
standard which encodes Tibetan that way using PUA characters for the
combinations. This is in everyday use in China.)

You can look on the Tibetan encoding as a compromise between the two
ideas - but it works well and there is no ambiguity.

> Or was the Khmer model of an invisible joiner a *later* bright idea?
> But really that doesn't hold water (I mean the "later" part) because
> the Indic virama model already existed, and whether or not Tibetan
> used the visible virama heavily need not have prevented from a virama
> character, which would have a visible form in appropriate contexts,
> causing stacking in other contexts.
>
> And even that thing about the contrast between the full-form subjoined
> consonants YA RA VA and half-form ones (I mean the -tags forms) need
> not prevent this, because you could encode a virama and have the
> *regular* (-tags) forms produced by it, and use separately encoded
> subjoined characters for the aberrant forms alone.
>
> As for the RA-MGO thing, I still am not sure how it is advisable to
> have a 0F6A glyphically identical to 0F62 and even if a
> default-ignorable ZWNJ would not have been satisfactory, some
> specialized non-default-ignorable conjoining-form-prevention character
> could be defined, which would then also be used for subjoined
> full-form YA RA VA avoiding those extra characters too.
>
> Or have it as you wish and encode 0F6A and 0FBA-0FBC to avoid such
> specialized character stuff, but still for the rest of the consonants
> including the prevalent -tags forms of YA RA VA, the justification
> provided for a full series of atomic subjoining characters seems quite
> insufficient...

> Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

- Chris
Received on Wed Apr 10 2013 - 23:29:50 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 10 2013 - 23:29:57 CDT