Re: CLDR plural handling info?

From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Mon Jul 11 2005 - 03:05:34 CDT

  • Next message: Erkki Kolehmainen: "Re: Arabic encoding model (alas, static!)"

    Mark Davis wrote:
    > What CLDR does have is provision for some particular cases: for the use of
    > different forms of months, depending on context: either stand-alone ("July")
    > or within formats ("July 1, 1942"); for the use of currency formats that
    > change according to the value of the number (see the formats for INR); and
    > for exceptional titlecasing of language display names.
    >
    > It would be extremely complex to describe all the mechanisms used for
    > forming arbitrary noun plurals in any particular language; and that does not
    > speak to more complex declinations of nouns or adjectives. What is it
    > exactly that you are looking for -- can you give an example?

    I meant simple user interface messages like "Copied 1 file." or "Copied
    2 files". As I said in my reply to my own question I mistakenly thought
    the mechanism for selecting the appropriate message (depending on the
    plurality) should be controlled by information from the CLDR. But that
    doesn't make sense, it's the message generating system which selects the
    appropriate message for the given plural form. In GNU gettext there are
    8 rules (at least several years ago there where 8) for determining the
    plural form for a given language (see below). So for each locale/language
    one needs to know which rule should be used to select the appropriate
    plural form. This is just one field, but it doesn't need to be in the
    CLDR per se.

    I do think the interface between a system for selecting user interface
    messages and the CLDR should be seamless. Therefore I believe the CLDR
    project should also provide a storage format for message catalogs and
    a format specification for the messages themselves. However the GNU
    gettext people will probably disagree.

    BTW, are there any plans to include the 140 more or less standardized
    color names in the CLDR?

    Regards,
    Theo

    Plural form rules and the languages to which they apply
    (source GNU gettext, not up to date):
    Rule 0
       one form:
         n==arbitrary -> plural form 0
       applies to:
         Finno-Ugric family
           Hungarian
         Asian family
           Japanese, Korean
         Turkic/Altaic family
           Turkish

    Rule 1
       two forms:
         n==1 -> plural form 0
         otherwise -> plural form 1
       applies to:
         Germanic family
           Danish, Dutch, English, German, Norwegian, Swedish
         Finno-Ugric family
           Estonian, Finnish
         Latin/Greek family
           Greek
         Semitic family
           Hebrew
         Romanic family
           Italian, Portuguese, Spanish
         Artificial
           Esperanto

    Rule 2
       two forms:
         n==0 || n==1 -> plural form 0
         otherwise -> plural form 1
       applies to:
         Romanic family
           French, Brazilian Portuguese

    Rule 3
       three forms:
         n%10==1 && n%100!=11 -> plural form 0
         n!=0 -> plural form 1
         otherwise -> plural form 2
       applies to:
         Baltic family
           Latvian

    Rule 4
       three forms:
         n==1 -> plural form 0
         n==2 -> plural form 1
         otherwise -> plural form 2
       applies to:
         Celtic
           Gaeilge

    Rule 5
       three forms:
         n%10==1 && n%100!=11 -> plural form 0
         n%10>=2 && (n%100<10 || n%100>=20) -> plural form 1
         otherwise -> plural form 2
       applies to:
         Baltic family
           Lithuanian

    Rule 6
       three forms:
         n%10==1 && n%100!=11 -> plural form 0
         n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) -> plural form 1
         otherwise -> plural form 2
       applies to:
         Slavic family
           Croatian, Czech, Russian, Slovak, Ukrainian

    Rule 7
       three forms:
         n==1 -> plural form 0
         n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) -> plural form 1
         otherwise -> plural form 2
       applies to:
         Slavic family
           Polish

    Rule 8
       four forms:
         n%100==1 -> plural form 0
         n%100==2 -> plural form 1
         n%100==3 || n%100==4 -> plural form 2
         otherwise -> plural form 3
       applies to:
         Slavic family
           Slovenian

    > ----- Original Message -----
    > From: "Theo Veenker" <Theo.Veenker@let.uu.nl>
    > To: "unicode" <unicode@unicode.org>
    > Sent: Friday, July 08, 2005 07:49
    > Subject: CLDR plural handling info?
    >
    >
    >
    >>Hi,
    >>
    >>In the CLDR 1.3 data there is no field describing which plural
    >>handling method should be used when generating messages for the
    >>given locale. Why?
    >>
    >>Is it because there are no user interface messages in the CLDR?
    >>IMO it would be a good idea if the CLDR project in addition to
    >>the LDML format would also provide a message catalog format.
    >>But then of course plural handling data should be included in
    >>the CLDR.
    >>
    >>Theo
    >>



    This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 03:07:06 CDT