From: Mark Davis (mark.davis@icu-project.org)
Date: Sat Jan 26 2008 - 13:44:58 CST
1. your remarks should go to the cldr-users group (you could also cc
them here if you want, but they should definitely go there).
2. if you are referring to a CLDR bug, give the number. When I search
for "transliteration" I only see 5 open bugs, none of which appear to be
what you are talking about.
Mark
On Jan 19, 2008 3:48 PM, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
> > De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De
> la
> > part de Rick McGowan
> > Envoyé : samedi 19 janvier 2008 17:58
> > À : unicode@unicode.org
> > Objet : Unicode Transliteration Guidelines released
> >
> > The Unicode CLDR committee has released
> > "Unicode Transliteration Guidelines":
> > http://www.unicode.org/cldr/transliteration_guidelines.html
>
> Note the following text:
> [quote]
> Even within particular languages, there can be variants according
> to
> different authorities, or even varying across time (if the
> authority
> changes its recommendation). The canonical identifier that CLDR
> uses
> for these has the form:
>
> source-target/variant
>
> The source (and target) can be a language or script, either using
> the
> English name or a locale code. The variant should specify the
> authority, and if necessary, the year. For example, the identifier
> for
> the Russian to Latin transliteration according to the UNGEGN would
> be
>
> ru-und_Latn/UNGEGN, or
> Russian-Latin/UNGEGN
> (...)
> [/quote]
>
> This description has a CLDR bug associated with it since quite long about
> the format of the identifier. And proposed changes, plus comments,
> suggesting that the use of '-' and '_' is not coherent with existing
> practices with locale identifiers where they are treated equivalently.
>
> Also the placement of the variant is ambiguous if the transliteration is
> reversed.
>
> This bug was accepted by a CLDR comity member but delayed for later
> resolution. Apparently it is still in this status, and has been forgotten.
>
> I have recently proposed a solution using another format, based on pure
> locale ids (because transliteration variants are effectively creating
> locale
> variants by defining an alternate orthography for the associated
> language):
> ru.und-Latn-UNGEGN
> und-Latn-UNGEGN.ru
> And forgetting the support for languages using full names like:
> Russian.Latin-UNGEGN
> (because most of these names are not part of the CLDR Root locale and
> English names for languages are often ambiguous or could create havoc with
> some language names that must include the separators needed for parsing)
>
> The format should then become simply:
> <Source-locale-id>.<Target-locale-id>
> where both locale ids are adhering to the RFC definition.
>
> (Note that I suggest treating "." and "/" equivalently for the separator
> between the two locales, like we should treat "_" and "-" equivalently as
> tag separators within the locale id; this makes the format compatible with
> existing locale id parsers, resource bundle parsers or resolvers where "/"
> could cause problems with filesystems).
>
>
>
>
>
>
-- Mark
This archive was generated by hypermail 2.1.5 : Sat Jan 26 2008 - 13:47:08 CST