RE: Unicode Transliteration Guidelines released

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Jan 19 2008 - 17:48:54 CST

Next message: abysta@yandex.ru: "Abkhasian CHE with descender"

Previous message: Rick McGowan: "Unicode Transliteration Guidelines released"
In reply to: Rick McGowan: "Unicode Transliteration Guidelines released"
Next in thread: Mark Davis: "Re: Unicode Transliteration Guidelines released"
Reply: Mark Davis: "Re: Unicode Transliteration Guidelines released"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
> part de Rick McGowan
> Envoyé : samedi 19 janvier 2008 17:58
> À : unicode@unicode.org
> Objet : Unicode Transliteration Guidelines released
>
> The Unicode CLDR committee has released
> "Unicode Transliteration Guidelines":
> http://www.unicode.org/cldr/transliteration_guidelines.html

Note the following text:
[quote]
        Even within particular languages, there can be variants according to
        different authorities, or even varying across time (if the authority
        changes its recommendation). The canonical identifier that CLDR uses
        for these has the form:

source-target/variant

        The source (and target) can be a language or script, either using
the
        English name or a locale code. The variant should specify the
        authority, and if necessary, the year. For example, the identifier
for
        the Russian to Latin transliteration according to the UNGEGN would
be

ru-und_Latn/UNGEGN, or
Russian-Latin/UNGEGN
(...)
[/quote]

This description has a CLDR bug associated with it since quite long about
the format of the identifier. And proposed changes, plus comments,
suggesting that the use of '-' and '_' is not coherent with existing
practices with locale identifiers where they are treated equivalently.

Also the placement of the variant is ambiguous if the transliteration is
reversed.

This bug was accepted by a CLDR comity member but delayed for later
resolution. Apparently it is still in this status, and has been forgotten.

I have recently proposed a solution using another format, based on pure
locale ids (because transliteration variants are effectively creating locale
variants by defining an alternate orthography for the associated language):
        ru.und-Latn-UNGEGN
        und-Latn-UNGEGN.ru
And forgetting the support for languages using full names like:
        Russian.Latin-UNGEGN
(because most of these names are not part of the CLDR Root locale and
English names for languages are often ambiguous or could create havoc with
some language names that must include the separators needed for parsing)

The format should then become simply:
<Source-locale-id>.<Target-locale-id>
where both locale ids are adhering to the RFC definition.

(Note that I suggest treating "." and "/" equivalently for the separator
between the two locales, like we should treat "_" and "-" equivalently as
tag separators within the locale id; this makes the format compatible with
existing locale id parsers, resource bundle parsers or resolvers where "/"
could cause problems with filesystems).

Next message: abysta@yandex.ru: "Abkhasian CHE with descender"
Previous message: Rick McGowan: "Unicode Transliteration Guidelines released"
In reply to: Rick McGowan: "Unicode Transliteration Guidelines released"
Next in thread: Mark Davis: "Re: Unicode Transliteration Guidelines released"
Reply: Mark Davis: "Re: Unicode Transliteration Guidelines released"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Jan 19 2008 - 22:32:02 CST