Maybe I differ from Glenn in thinking this, but...
I think that we encoded Tibetan one way, and it should be USED that way, as
intended. The sub-scripted entities were put there to be used, and I think
there is only one "correct" way to view the Unicode encoding for Tibetan. It
would be "unforgivably wrong" to allow just base characters to be input, and
then to use viramas to join them. In the best of all possible worlds, that is
what I personally would have done, and what I would have preferred, and what
would have been "pure". However, that's NOT what got encoded. Now, I'd be
really disappointed if after LOSING the Glorious Battle for Tibetan Purity
(due to all the political considerations and compromises and supposed user
requirements) the market were to turn around and IGNORE what finally got
encoded! Or maybe I should view that eventuality as a vindication and
rejoice? But if it came to that, I'd have to ask "why didn't we just leave it
how it was in the Original Primordial Unicode spec???"
I think that in as close-knit (and computationally small) a community as the
Tibetan Users, people could get consensus on this. Stupidly, the Tibetan
script has been possibly the single most controversial one we've ever encoded
-- and it's not native controversy, it's been a battle of foreign experts! I
would hope that ALL implementers can get their act together on it. Please?
Anyway, my reply to Mike Forgey, for what it's worth, is appended.
Oh, and P.S., I forgot below to mention that the Unicode encoding also has the
advantage of delineating stack boundaries nicely without "n-character
Subject: Re: Tibetan/Burmese/Khmer
Mike Forgey wrote...
> I am wondering why it was decided to include in Unicode 2.0, two
> encodings for each Tibetan consonant
The reason... It's a long, long story and it mostly involves politics and
personalities. And I hope someone more sympathetic to this method can provide
a better (or more convincing) explanation that I will.
Well, there is/was an existing implementation for the Bhutanese government
which uses this system, and personally after seeing Tibetan languish in
controversy and apathy for years, I'm happy to see it encoded at all. This
method DOES help cut down data space requirements by as much as 1/3. Still,
in my opinion, it would have been better to go with virama+nominal for the
subjoined forms, but there you are. I'm not a big fan of this encoding
method, but at least it's pretty unambiguous; it's compact; it's less
cumbersome than proposals for many thousand pre-composed stacks; users can
understand it and type it easily; and it's easy to implement without fancy
support. And I thought the explanation in the book would be sufficient...
> The syllabic principles of the Tibetan script seems to be about the same
> as for the Indian scripts; why not require encoding Tibetan dead consonants
> with the virama, as is required for Devanagari, etc?
Yes, true, but...it wasn't done that way. The virama, in Devanagari, has an
existence that regular people understand and utilize. In Tibetan, only
learned scholars have ever HEARD of the virama, and it just "isn't used"
except in very special circumstances, and never by normal users.
You cannot pick the encoding method. Use the subjoined forms where you would
have meant virama+consonant, and the shape-changes should happen automatically
in rendering (as indicated). If you need to AVOID some shape-change, then you
can use virama+consonant for things like "wa" when it should NOT change to
> Are the Tibetan subjoined characters considered to be equal to the
> nominal form preceded by VIRAMA; i.e., 0F90 = 0F84 + 0F40?
Uh, probably the answer is "NO". Don't encode with virama unless you mean to
provide something that has an "abnormal" spelling for some specific effect.
> Since distinct codes have been allocated for Tibetan subjoined
> consonants, is it expected that distinct codes will be allocated for
> Burmese and/or Khmer sub-consonants?
Absolutely not. Tibetan in this regard is an aberration.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT