Re: CLDR errors that can't be corrected

From: Philippe Verdy (
Date: Wed May 17 2006 - 23:33:56 CDT

  • Next message: Ngwe Tun: "Myanmar Unicode Extension Update."

    From: "Richard Wordingham" <>
    > Philippe Verdy wrote on Wednesday, May 17, 2006 at 11:33 PM
    >> In fact there are many other errors that can't be corrected. For example,
    >> the default French locale should not accept calendar entries that contain
    >> non examplar characters or other characters normally not used in French,
    >> for example:
    >> * Islamic months with the back-apostrophe, which should be accepted only
    >> in French variants that include this character, such as North-African
    >> french, where French is used using an extended set with additional
    >> characters for transcripting arabic words...
    > It's hardly surprising that this sort of problem crops up. The locale data
    > include a lot of words that are unfamiliar and debatable - as well as the
    > familiar and debatable ones. (Unsurprisingly, I can't find my country of
    > birth in the list of territories.) I went through a language pick-list at
    > work the other day, and found about a dozen that were misspelt, including
    > 'Sudanese' for 'Sundanese' and 'Welch' for 'Welsh'. I'm reluctant to have
    > the supplier paid to fix it, though I'm tempted to play the racism card.
    > English has the same exemplar character issue with the 'turned comma' (but
    > with inherited Islamic months), and also in the 'Tonga Pa Ľanga'.

    The case of examplar characters is not the most critical issue I want to comment here (it's true that I would prefer that pedantic names be located in another resource or in a specific sub-locale).

    Using the high turned comma to denote a glottal stop that does not even exist in French phonology, looks like a bad choice (it's not even pedantic because it should be a true glottal stop instead of this ersatz) and is best approximated with a pharyngal which is generally noted like the unvoiced aspirated 'h' in French (and probably as well in basic English, and most west european languages).

    Noting long vowels for Japanese is also pedantic (French does have long vowels, but does not denote them explicitly,why should it use a macron, a old usage inherited from the Roman Latin, and abandonned since long in the Vulgar Latin from which all Romance languages are originated). If onereally wants to make a distinction, the commonconvention, that uses only French characters, is to replace the macron with a circumflex (convention also used for transliterating long vowels of almost all indic scripts into French).

    When I spoke about errors, they are true errors, that cause conflicts between two resources that should be distinct but conflict by returning the same value.

    Those that are confusing "Sundanese" ("Soundanais" in French) and "Sudanese" ("Soudanais" in French"), are the same who also confuse "Guyana" and "French Guyane" (two bordering regions belonging to distinct countries), or confuse "Northern Sotho" and "Southern Sotho" in then conflicting resources.

    The case of "Welch" vs. "Welsh" (similar to its French translation "Gallois" vs. "Galois") is much less dramatic as this does not create confusion.This is just a typo that can be corrected (and may be it's debatable, in some English speaking regions where an alternate orthogaphy is commonly used that does not match the one used in Wales and UK). In a default english locale, the best locale orthography should be used, and if regional variants of the language really accept and prefer using an alternate orthography, these can be specified in sub-locales overriding the inhered value of the resource.

    This archive was generated by hypermail 2.1.5 : Wed May 17 2006 - 23:41:27 CDT