Re: Regulating PUA.

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jan 22 2007 - 02:32:58 CST

  • Next message: Philippe Verdy: "Re: Proposing UTF-21/24"

    Actually, you'll see that Unicode itselfs makes some usage of PUAs, but not part ofthe Unicode standard itself.
    Look at the CLDR, for example in the transliteration schemes for InterIndic: these PUAs are used as intermediate codesfor the transliteration from/to Indic scripts, even if they could conflict with other PUAs found in documents and that the transliterator should not affect.

    Insn't there a way to make the CLDR InterIndic use exclusively the subsets of characters it is supposed to convert?

    A working implementation of an Indic transliterator that does not use any PUA is found for example in the Latin-to-Telugu transliterator input mode of the Telugu Wikipedia: the Javascript used there references **exclusively** regular Latin and Telugu characters to make all its work, using a state machine where intermediate character clusters are represented as chains of Telugu and Latin characters only (these chains are named "Hashes" there).

    This raises a question about the validity of the presence of the "InterIndic" PUA encoding in the CLDR... Probably those PUAs could be encoded using one of the standard Indic scripts only. This is always possible in transliterators, by using a finite-state automata, where intermediate states for the Latin-to-Indic are represented by strings containing zero or more Indic characters followed by zero or more Latin characters. You don't need any PUA to represent these internal states!

    ----- Original Message -----
    From: <vunzndi@vfemail.net>
    To: "Adam Twardoch" <list.adam@twardoch.com>
    Cc: "Ruszlan Gaszanov" <ruszlan@ather.net>; <unicode@unicode.org>
    Sent: Monday, January 22, 2007 2:09 AM
    Subject: Re: Regulating PUA.

    >
    > I agree with Adam -- while it would have been acceptable to designate
    > diffrent types of PUA at the time they were first established, to do
    > so now would be going against the designation already given.



    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 02:34:57 CST