Re: 28th IUC paper - Tamil Unicode New

From: Richard Wordingham (
Date: Mon Aug 22 2005 - 16:58:16 CDT

  • Next message: Philippe Verdy: "Re: 28th IUC paper - Tamil Unicode New"

    Philippe Verdy wrote:

    > From: "Abhijit Dutta अभिजीत दत्ता" <>
    >> The Ministry of information technology, Govt., of India is
    >> distributing free CDs with a lot of Tamil software. The CD has
    >> about 50 fonts which uses the alternative scheme. It is proposed
    >> to talk to the developers of the Tamil Open Office to use the font
    >> with the alternate scheme. A representation will be sent to major
    >> software vendors incuding Microsoft to use the new scheme. All
    >> members of KTS have agreed to use the new scheme in their software."

    > How can all software vendors agree to use the PUA scheme? If it was so,
    > then this PUA block would become permanently bound to the "New Tamil"
    > encoding scheme, meaning that the purpose of PUAs would be defeated. Using
    > the PUAs not only requires an agreement with the software vendor, but also
    > with the effective users of this scheme.

    It's not so very different from the Tibetan conjuncts added to GB18030.
    Until such time (if ever) that they are given proper Unicode codepoints,
    they too must reside in the 'PUA'. The difference is that China has the
    economic power to force them to have de facto encodings (PUA) if not de

    I think we are in the following silly situation:

    1) The 'New Tamil' can be defined in the PUA or (possibly over several dead
    bodies) as standard characters and used in XML.

    2) The 'stability pact' prohibits the definition of a formal equivalence
    between 'New Tamil' and 'Old Tamil', i.e. the modern Tamil encoding starting
    at U+0B80. Thus data in the official Unicode Tamil encoding becomes legacy
    data if New Tamil succeeds.

    > Nevertheless, I approve such initiative when it helps creating a stable
    > model for representing Modern Tamil. But this won't have any success if
    > the PUA scheme is not also strictly bound to standard Unicode/ISO 10646-1
    > code points, using an unambiguous mapping that will work bijectively at
    > least for the subset of Modern Tamil texts representable with this PUA
    > scheme; with that mapping, it will in fact be easier to interchange the
    > represented texts, by remapping the PUA-encoded texts to standard Unicode,
    > so that PUA agreements will no more be needed.

    Doesn't the rival encoding round trip with Unicode (modulo canonical
    equivalence) for well-formed, subscript-free Tamil? (I still haven't been
    granted access to the 'public' definition of the encoding.) Windows XP does
    not support subscripted or superscripted Tamil (example at ) - I
    suspect TUNE doesn't either.

    > Such scheme will also help fixing the various fonts so that they will
    > support correctly at least the subset of texts representable with the "New
    > Tamil" PUA scheme. But this does not require that fonts be prepared to
    > support these PUAs. I think it will be much more productive to create
    > OpenType fonts using the standard Unicode codepoints, and a well-defined
    > set of GSUB/GPOS tables. This way, these fonts will be also usable
    > interoperably.

    Of course, Tamil Unicode fonts don't work with Windows XP any more. They
    used to work with Windows XP before Unicode 4.1.0, but can't handle SHA.

    > More generally, a charset registered by a national standard authority in
    > the IANA charsets registry would work more successfully and more reliably
    > than a system based on private agreements on PUAs (simply because charsets
    > can be easily transported in MIME, unlike PUA agreements), and also
    > because the solutions to support other charsets than UTF's already exists
    > and well implemented an deployed, and also because charsets work reliably
    > with Unicode/ISO 10646 as well.

    Surely the whole point of TUNE is that it work with basic Unicode support,
    without any awareness of Tamil as a distinct script. Having a special
    charset defeats that purpose.


    This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 16:59:26 CDT