Re: CLDR plural handling info?

From: Mark Davis (mark.davis@jtcsv.com)
Date: Mon Jul 11 2005 - 13:02:52 CDT

  • Next message: Peter Constable: "RE: Handling of Combining Characters"

    > But that
    > doesn't make sense, it's the message generating system which selects the
    > appropriate message for the given plural form.

    Yes, agreed.

    > In GNU gettext there are
    > 8 rules (at least several years ago there where 8) for determining the
    > plural form for a given language (see below).

    CLDR uses a mechanism that is more flexible than what you describe, since it
    can match any number range. See
    http://www.unicode.org/reports/tr35/#Choice_Patterns. But it only uses that
    for currency formats, currently.

    > I do think the interface between a system for selecting user interface
    > messages and the CLDR should be seamless. Therefore I believe the CLDR
    > project should also provide a storage format for message catalogs and
    > a format specification for the messages themselves. However the GNU
    > gettext people will probably disagree.

    And I suspect that the CLDR committee would also disagree. The goal of CLDR
    is to provide common data, not an implementation mechanism. It does have to
    get into formatting, where that affects the result -- such as with months in
    date formats -- but not general text formatting or language analysis.

    >
    > BTW, are there any plans to include the 140 more or less standardized
    > color names in the CLDR?

    You can see for yourself what are topics under discussion in CLDR by looking
    at http://www.jtcsv.com/cgibin/locale-bugs. For example, in the box marked
    "Regular Expression", enter 'color' then Select. You'll see two entries,
    neither of which pertain. You are welcome to file feature requests or bugs
    there -- however, note that there is a considerable amount of work with new
    types of data, so you'd need a good case to justify why this is needed in
    the common repository.

    ‚ÄéMark

    ----- Original Message -----
    From: "Theo Veenker" <Theo.Veenker@let.uu.nl>
    To: "Mark Davis" <mark.davis@jtcsv.com>
    Cc: "unicode" <unicode@unicode.org>
    Sent: Monday, July 11, 2005 01:05
    Subject: Re: CLDR plural handling info?

    > Mark Davis wrote:
    > > What CLDR does have is provision for some particular cases: for the use
    of
    > > different forms of months, depending on context: either stand-alone
    ("July")
    > > or within formats ("July 1, 1942"); for the use of currency formats that
    > > change according to the value of the number (see the formats for INR);
    and
    > > for exceptional titlecasing of language display names.
    > >
    > > It would be extremely complex to describe all the mechanisms used for
    > > forming arbitrary noun plurals in any particular language; and that does
    not
    > > speak to more complex declinations of nouns or adjectives. What is it
    > > exactly that you are looking for -- can you give an example?
    >
    > I meant simple user interface messages like "Copied 1 file." or "Copied
    > 2 files". As I said in my reply to my own question I mistakenly thought
    > the mechanism for selecting the appropriate message (depending on the
    > plurality) should be controlled by information from the CLDR. But that
    > doesn't make sense, it's the message generating system which selects the
    > appropriate message for the given plural form. In GNU gettext there are
    > 8 rules (at least several years ago there where 8) for determining the
    > plural form for a given language (see below). So for each locale/language
    > one needs to know which rule should be used to select the appropriate
    > plural form. This is just one field, but it doesn't need to be in the
    > CLDR per se.
    >
    > I do think the interface between a system for selecting user interface
    > messages and the CLDR should be seamless. Therefore I believe the CLDR
    > project should also provide a storage format for message catalogs and
    > a format specification for the messages themselves. However the GNU
    > gettext people will probably disagree.
    >
    > BTW, are there any plans to include the 140 more or less standardized
    > color names in the CLDR?
    >
    > Regards,
    > Theo
    >
    > Plural form rules and the languages to which they apply
    > (source GNU gettext, not up to date):
    > Rule 0
    > one form:
    > n==arbitrary -> plural form 0
    > applies to:
    > Finno-Ugric family
    > Hungarian
    > Asian family
    > Japanese, Korean
    > Turkic/Altaic family
    > Turkish
    >
    > Rule 1
    > two forms:
    > n==1 -> plural form 0
    > otherwise -> plural form 1
    > applies to:
    > Germanic family
    > Danish, Dutch, English, German, Norwegian, Swedish
    > Finno-Ugric family
    > Estonian, Finnish
    > Latin/Greek family
    > Greek
    > Semitic family
    > Hebrew
    > Romanic family
    > Italian, Portuguese, Spanish
    > Artificial
    > Esperanto
    >
    > Rule 2
    > two forms:
    > n==0 || n==1 -> plural form 0
    > otherwise -> plural form 1
    > applies to:
    > Romanic family
    > French, Brazilian Portuguese
    >
    > Rule 3
    > three forms:
    > n%10==1 && n%100!=11 -> plural form 0
    > n!=0 -> plural form 1
    > otherwise -> plural form 2
    > applies to:
    > Baltic family
    > Latvian
    >
    > Rule 4
    > three forms:
    > n==1 -> plural form 0
    > n==2 -> plural form 1
    > otherwise -> plural form 2
    > applies to:
    > Celtic
    > Gaeilge
    >
    > Rule 5
    > three forms:
    > n%10==1 && n%100!=11 -> plural form 0
    > n%10>=2 && (n%100<10 || n%100>=20) -> plural form 1
    > otherwise -> plural form 2
    > applies to:
    > Baltic family
    > Lithuanian
    >
    > Rule 6
    > three forms:
    > n%10==1 && n%100!=11 -> plural form 0
    > n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) -> plural form 1
    > otherwise -> plural form 2
    > applies to:
    > Slavic family
    > Croatian, Czech, Russian, Slovak, Ukrainian
    >
    > Rule 7
    > three forms:
    > n==1 -> plural form 0
    > n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) -> plural form 1
    > otherwise -> plural form 2
    > applies to:
    > Slavic family
    > Polish
    >
    > Rule 8
    > four forms:
    > n%100==1 -> plural form 0
    > n%100==2 -> plural form 1
    > n%100==3 || n%100==4 -> plural form 2
    > otherwise -> plural form 3
    > applies to:
    > Slavic family
    > Slovenian
    >
    >
    > > ----- Original Message -----
    > > From: "Theo Veenker" <Theo.Veenker@let.uu.nl>
    > > To: "unicode" <unicode@unicode.org>
    > > Sent: Friday, July 08, 2005 07:49
    > > Subject: CLDR plural handling info?
    > >
    > >
    > >
    > >>Hi,
    > >>
    > >>In the CLDR 1.3 data there is no field describing which plural
    > >>handling method should be used when generating messages for the
    > >>given locale. Why?
    > >>
    > >>Is it because there are no user interface messages in the CLDR?
    > >>IMO it would be a good idea if the CLDR project in addition to
    > >>the LDML format would also provide a message catalog format.
    > >>But then of course plural handling data should be included in
    > >>the CLDR.
    > >>
    > >>Theo
    > >>
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 15:11:53 CDT