Re: Unicode Transliteration Guidelines released

From: Mark Davis (mark.davis@icu-project.org)
Date: Sat Jan 26 2008 - 13:44:58 CST

  • Next message: Philippe Verdy: "RE: Unicode Transliteration Guidelines released"

       1. your remarks should go to the cldr-users group (you could also cc
       them here if you want, but they should definitely go there).
       2. if you are referring to a CLDR bug, give the number. When I search
       for "transliteration" I only see 5 open bugs, none of which appear to be
       what you are talking about.

    Mark

    On Jan 19, 2008 3:48 PM, Philippe Verdy <verdy_p@wanadoo.fr> wrote:

    > > De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De
    > la
    > > part de Rick McGowan
    > > Envoyé : samedi 19 janvier 2008 17:58
    > > À : unicode@unicode.org
    > > Objet : Unicode Transliteration Guidelines released
    > >
    > > The Unicode CLDR committee has released
    > > "Unicode Transliteration Guidelines":
    > > http://www.unicode.org/cldr/transliteration_guidelines.html
    >
    > Note the following text:
    > [quote]
    > Even within particular languages, there can be variants according
    > to
    > different authorities, or even varying across time (if the
    > authority
    > changes its recommendation). The canonical identifier that CLDR
    > uses
    > for these has the form:
    >
    > source-target/variant
    >
    > The source (and target) can be a language or script, either using
    > the
    > English name or a locale code. The variant should specify the
    > authority, and if necessary, the year. For example, the identifier
    > for
    > the Russian to Latin transliteration according to the UNGEGN would
    > be
    >
    > ru-und_Latn/UNGEGN, or
    > Russian-Latin/UNGEGN
    > (...)
    > [/quote]
    >
    > This description has a CLDR bug associated with it since quite long about
    > the format of the identifier. And proposed changes, plus comments,
    > suggesting that the use of '-' and '_' is not coherent with existing
    > practices with locale identifiers where they are treated equivalently.
    >
    > Also the placement of the variant is ambiguous if the transliteration is
    > reversed.
    >
    > This bug was accepted by a CLDR comity member but delayed for later
    > resolution. Apparently it is still in this status, and has been forgotten.
    >
    > I have recently proposed a solution using another format, based on pure
    > locale ids (because transliteration variants are effectively creating
    > locale
    > variants by defining an alternate orthography for the associated
    > language):
    > ru.und-Latn-UNGEGN
    > und-Latn-UNGEGN.ru
    > And forgetting the support for languages using full names like:
    > Russian.Latin-UNGEGN
    > (because most of these names are not part of the CLDR Root locale and
    > English names for languages are often ambiguous or could create havoc with
    > some language names that must include the separators needed for parsing)
    >
    > The format should then become simply:
    > <Source-locale-id>.<Target-locale-id>
    > where both locale ids are adhering to the RFC definition.
    >
    > (Note that I suggest treating "." and "/" equivalently for the separator
    > between the two locales, like we should treat "_" and "-" equivalently as
    > tag separators within the locale id; this makes the format compatible with
    > existing locale id parsers, resource bundle parsers or resolvers where "/"
    > could cause problems with filesystems).
    >
    >
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Sat Jan 26 2008 - 13:47:08 CST