RE: Resolution process

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 30 2007 - 12:11:11 CDT

  • Next message: Daniel Yacob: "Re: non-terrestrial writing systems"

    There are disputed entries that were part of CLDR 1.4 by error : it was not
    even possible to avoid them to be released, because the proposal period for
    CLDR 1.4 was extremely short (about one month), it was not announced clearly
    in advanced (a single message posted to the Unicode list, that was not
    delivered to every one), and then inaccessible for a part of that period
    (lots of technical problems of performance on the server, even during that
    period).

     

    The result was that the CLDR 1.4 contained many entries fro which it was
    later impossible to vote AGAINST their addition.

     

    Notably, MANY entries that were directly copy-pasted from the English root
    locale, without any effective search for the correct orthography (and
    nothing could be made by vetters to avoid their publication, given that the
    CLDR 1.4 vetting period had too many technical problems forbidding all
    accesses most of the time).

     

    These entries are still remaining today for the CLDR 1.5 vetting process.

     

    But despite enough proposals have been made to correct these incorrect
    English imports, the fact that there are several (correct) alternatives is
    blocking the correction of many entries, and the old incorrect CLDR 1.4 data
    (with the blue star) is still there. But I don’t understand why it still
    remains the “preferred” entry (on green background), despite every vetter
    voted AGAINST them by submitting other proposals.

     

    Even when we agree about a common term, two votes by vetters, plus data from
    other reference implementers (Apple, Google, Yahoo, Microsoft) is not enough
    to avoid keeping those bogous 1.4 entries that NOBODY ever wants.

     

    How can this be corrected?

     

    I have made several comments on the CLDR forum about the other problems that
    the system does: lack of consistency between multiple entries is another
    problem, notably for naming conventions:

    * how to insert less-significant words for language names in lists (words
    like “languages” used for the names of language families or groups, using
    commas)?

    * how to consistantly use singular or plural, use of neutral or feminine

    * how to differentiate names for use in isolation within text, or in short
    form (like in data tables), or in long lists for input forms (like
    combo-boxes, where languages should be sorted)

    * how to indicate language qualifiers (using parentheses, which may be
    optionally removed, like dates)?

    * how to indicate acceptable language name variants (there are multiple
    names even in reference documents like ISO standards, and the Unicode
    standard itself), and a “preferred” name that does not exclude the other
    names as incorrect?

    * how to identify special names (such as [mis]=“miscellaneous languages”, or
    [root]=”Root”): should they be made easily distinctable in lists?

     

    Please join the CLDR, and at least vote AGAINST these old incorrect entries
    with the blue star.

     

    (For example, there are many incorrect English entries remaining from CLDR
    1.4 in the French locale, and despite proposals have been made in due time
    to correct them, there are still not enough votes to exclude those incorrect
    entries, even though there are several possible alternatives to correct
    them).

     

    Note that unfortunately, if some alternate proposals were made, they were
    DELETED recently from the CLDR if they currently had no vote for them when
    the proposal phase was ended. Some entries will not allow alternatives now.
    But the current entry inherited from bogous CLDR 1.4 MUST be changed.

     

    I still fear that not enough votes will be possible to avoid publishing ONCE
    AGAIN this bogous data, even if all sources today, and all existing vetters
    agree that these data were incorrect!

     

    Really, if there remains disputed entries, there should be a review at end
    by the CLDR comity, that will read the posted comments in the forum, will
    consider not only the votes but what was the nature of the different
    proposals and why they were made (a simple automatic voting system can’t
    track that).

     

    This should be done at least for ALL major languages (consider for example
    the 20 first languages of Wikipedia as an hint about which language to
    consider in priority, because these errors will contaminate soon all other
    locales).

     

    And independently of the automated vetting process that operates on a
    entry-per-entry vote, there are some general consistency decisions to make
    for the presentation of data (there are for example agreement on the
    terminology, but different ways to represent the important terms). When
    there remains some disputes, many of them could be decided in a single
    group. There’s currently NO way to automate this decision for groups of
    related disputed entries.

     

    This can only be done with the help of the CLDR technical commity by humane
    bodies that are much smarter than an automatic script (whose behaviour is
    still not perfect, is constantly changed, corrected, restarted, generates
    errors, forgets votes, generates duplicates entries or could not accept
    every valid proposals).

     

      _____

    De : cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org] De
    la part de Mark Davis
    Envoyé : mardi 29 mai 2007 18:36
    À : CLDR list; cldr-users@unicode.org
    Objet : Resolution process

     

    The resolution process for CLDR 1.5 has been updated by the technical
    committee; please see
    http://unicode.org/cldr/process.html#data_vetting_process. This update
    changes the way that the data is resolved, giving weight to guest translator
    votes, and adding a new draft status.

    The changes for the survey tool for vetting the data submitted during the
    CLDR 1.5 data submission phase are nearing completion. While not yet
    complete, feedback is welcome on the UI changes (
    http://unicode.org/cldr/apps/survey <http://unicode.org/cldr/apps/survey> ).
    In particular,

    * The prospective CLDR 1.5 value is shown in its own column; in green
    if it would have the status "confirmed".
    * The previous CLDR 1.4 item is shown with a star.
    * Adding new values will only be allowed when:

    * The value is disputed (more than one alternative)
    * One of the alternatives has an error
    * The field is needed for minimal coverage

    BTW, The previous process document is at
    http://unicode.org/repository/*checkout*/cldr/docs/web/process.html?rev=1.29
    .

    -- 
    Mark 
    


    This archive was generated by hypermail 2.1.5 : Wed May 30 2007 - 12:16:26 CDT