From: Philippe Verdy (firstname.lastname@example.org)
Date: Wed May 30 2007 - 12:11:11 CDT
There are disputed entries that were part of CLDR 1.4 by error : it was not
even possible to avoid them to be released, because the proposal period for
CLDR 1.4 was extremely short (about one month), it was not announced clearly
in advanced (a single message posted to the Unicode list, that was not
delivered to every one), and then inaccessible for a part of that period
(lots of technical problems of performance on the server, even during that
The result was that the CLDR 1.4 contained many entries fro which it was
later impossible to vote AGAINST their addition.
Notably, MANY entries that were directly copy-pasted from the English root
locale, without any effective search for the correct orthography (and
nothing could be made by vetters to avoid their publication, given that the
CLDR 1.4 vetting period had too many technical problems forbidding all
accesses most of the time).
These entries are still remaining today for the CLDR 1.5 vetting process.
But despite enough proposals have been made to correct these incorrect
English imports, the fact that there are several (correct) alternatives is
blocking the correction of many entries, and the old incorrect CLDR 1.4 data
(with the blue star) is still there. But I don’t understand why it still
remains the “preferred” entry (on green background), despite every vetter
voted AGAINST them by submitting other proposals.
Even when we agree about a common term, two votes by vetters, plus data from
other reference implementers (Apple, Google, Yahoo, Microsoft) is not enough
to avoid keeping those bogous 1.4 entries that NOBODY ever wants.
How can this be corrected?
I have made several comments on the CLDR forum about the other problems that
the system does: lack of consistency between multiple entries is another
problem, notably for naming conventions:
* how to insert less-significant words for language names in lists (words
like “languages” used for the names of language families or groups, using
* how to consistantly use singular or plural, use of neutral or feminine
* how to differentiate names for use in isolation within text, or in short
form (like in data tables), or in long lists for input forms (like
combo-boxes, where languages should be sorted)
* how to indicate language qualifiers (using parentheses, which may be
optionally removed, like dates)?
* how to indicate acceptable language name variants (there are multiple
names even in reference documents like ISO standards, and the Unicode
standard itself), and a “preferred” name that does not exclude the other
names as incorrect?
* how to identify special names (such as [mis]=“miscellaneous languages”, or
[root]=”Root”): should they be made easily distinctable in lists?
Please join the CLDR, and at least vote AGAINST these old incorrect entries
with the blue star.
(For example, there are many incorrect English entries remaining from CLDR
1.4 in the French locale, and despite proposals have been made in due time
to correct them, there are still not enough votes to exclude those incorrect
entries, even though there are several possible alternatives to correct
Note that unfortunately, if some alternate proposals were made, they were
DELETED recently from the CLDR if they currently had no vote for them when
the proposal phase was ended. Some entries will not allow alternatives now.
But the current entry inherited from bogous CLDR 1.4 MUST be changed.
I still fear that not enough votes will be possible to avoid publishing ONCE
AGAIN this bogous data, even if all sources today, and all existing vetters
agree that these data were incorrect!
Really, if there remains disputed entries, there should be a review at end
by the CLDR comity, that will read the posted comments in the forum, will
consider not only the votes but what was the nature of the different
proposals and why they were made (a simple automatic voting system can’t
This should be done at least for ALL major languages (consider for example
the 20 first languages of Wikipedia as an hint about which language to
consider in priority, because these errors will contaminate soon all other
And independently of the automated vetting process that operates on a
entry-per-entry vote, there are some general consistency decisions to make
for the presentation of data (there are for example agreement on the
terminology, but different ways to represent the important terms). When
there remains some disputes, many of them could be decided in a single
group. There’s currently NO way to automate this decision for groups of
related disputed entries.
This can only be done with the help of the CLDR technical commity by humane
bodies that are much smarter than an automatic script (whose behaviour is
still not perfect, is constantly changed, corrected, restarted, generates
errors, forgets votes, generates duplicates entries or could not accept
every valid proposals).
De : email@example.com [mailto:firstname.lastname@example.org] De
la part de Mark Davis
Envoyé : mardi 29 mai 2007 18:36
À : CLDR list; email@example.com
Objet : Resolution process
The resolution process for CLDR 1.5 has been updated by the technical
committee; please see
http://unicode.org/cldr/process.html#data_vetting_process. This update
changes the way that the data is resolved, giving weight to guest translator
votes, and adding a new draft status.
The changes for the survey tool for vetting the data submitted during the
CLDR 1.5 data submission phase are nearing completion. While not yet
complete, feedback is welcome on the UI changes (
http://unicode.org/cldr/apps/survey <http://unicode.org/cldr/apps/survey> ).
* The prospective CLDR 1.5 value is shown in its own column; in green
if it would have the status "confirmed".
* The previous CLDR 1.4 item is shown with a star.
* Adding new values will only be allowed when:
* The value is disputed (more than one alternative)
* One of the alternatives has an error
* The field is needed for minimal coverage
BTW, The previous process document is at
This archive was generated by hypermail 2.1.5 : Wed May 30 2007 - 12:16:26 CDT