Re: minimizing size (was Re: allocation of Georgian letters)

From: Doug Ewell (dewell@roadrunner.com)
Date: Sat Feb 09 2008 - 13:03:24 CST

Next message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"

Previous message: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: John H. Jenkins: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Sinnathurai Srivas <sisrivas at blueyonder dot co dot uk> wrote:

> Tamil need not be a CTL script. It can work 100% and work better than
> CTL enabled Tamil. Why then is Tamil classed as CTL script? What is
> the criteria?

The entire group of nine Indic scripts was considered to be structurally
related when they were encoded in Unicode 1.0. This included
classifying all of them as CTL, although that term was not used in
Unicode 1.0. The following passages from TUS 1.0, page 53 describe the
encoding model:

"The Unicode standard follows the ISCII (Indian Standard Code for
Information Interchange) code standard in treating all nine of the
official Indian scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya,
Tamil, Telugu, Kannada, and Malayalam) in a parallel way."

and

"The graphemic syllable is built up of alphabetic pieces, the actual
letters of the Devanagari script. These consist of three major types:
consonants, dependent vowels, and independent vowels."

This is the reason why Tamil is encoded in Unicode the way it is.
Whether or not anyone agrees that it should have been encoded that way
is a different matter.

> As for publishing, attempt to use Unicode Tamil fails. If it is
> acheivable, when will it be ready?

This question misstates the concept "Some publishing applications that
use Unicode Tamil are broken" as "Unicode Tamil is broken for
publishing." All that is necessary to disprove the latter is to show at
least one publishing application which uses Unicode Tamil and generates
correct results, and John Jenkins has already done that.

> Again what is the criteria for stopping Tamil using workable solution
> and what is the criteria for enforcing non-working solution?

The criterion is that duplicate encodings will not be created. This was
done with Hangul in the early 1990s (actually removing the old encoding)
when Unicode was very new and supported by very few systems. Read
Section 3 of RFC 2279: "The incident has been dubbed the "Korean mess",
and the relevant committees have pledged to never, ever again make such
an incompatible change."

> I think we can atleast move fast, if we introduce all necessary
> canonical forms now, most of the publishing s/w may work with
> canonical forms.

Read the ISO "Principles and Procedures" document at
http://www.dkuug.dk/JTC1/SC2/WG2/docs/n3102.pdf to see why duplicate
encodings are no longer allowed. Reinventing TUNE as a question of
"canonical forms" and "non-canonical forms" doesn't change this. If you
want software to work with a different Tamil model, use the PUA.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

Next message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
Previous message: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: John H. Jenkins: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 13:05:12 CST