Re: CLDR plural handling info?

From: Theo Veenker ([email protected])
Date: Mon Jul 11 2005 - 03:05:34 CDT

Next message: Erkki Kolehmainen: "Re: Arabic encoding model (alas, static!)"

Previous message: Asmus Freytag: "Re: Proofreading fonts"
In reply to: Mark Davis: "Re: CLDR plural handling info?"
Next in thread: Mark Davis: "Re: CLDR plural handling info?"
Reply: Mark Davis: "Re: CLDR plural handling info?"
Reply: Patrick Andries: "Re: CLDR plural handling info?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Davis wrote:
> What CLDR does have is provision for some particular cases: for the use of
> different forms of months, depending on context: either stand-alone ("July")
> or within formats ("July 1, 1942"); for the use of currency formats that
> change according to the value of the number (see the formats for INR); and
> for exceptional titlecasing of language display names.
>
> It would be extremely complex to describe all the mechanisms used for
> forming arbitrary noun plurals in any particular language; and that does not
> speak to more complex declinations of nouns or adjectives. What is it
> exactly that you are looking for -- can you give an example?

I meant simple user interface messages like "Copied 1 file." or "Copied
2 files". As I said in my reply to my own question I mistakenly thought
the mechanism for selecting the appropriate message (depending on the
plurality) should be controlled by information from the CLDR. But that
doesn't make sense, it's the message generating system which selects the
appropriate message for the given plural form. In GNU gettext there are
8 rules (at least several years ago there where 8) for determining the
plural form for a given language (see below). So for each locale/language
one needs to know which rule should be used to select the appropriate
plural form. This is just one field, but it doesn't need to be in the
CLDR per se.

I do think the interface between a system for selecting user interface
messages and the CLDR should be seamless. Therefore I believe the CLDR
project should also provide a storage format for message catalogs and
a format specification for the messages themselves. However the GNU
gettext people will probably disagree.

BTW, are there any plans to include the 140 more or less standardized
color names in the CLDR?

Regards,
Theo

Plural form rules and the languages to which they apply
(source GNU gettext, not up to date):
Rule 0
   one form:
     n==arbitrary -> plural form 0
   applies to:
     Finno-Ugric family
       Hungarian
     Asian family
       Japanese, Korean
     Turkic/Altaic family
       Turkish

Rule 1
   two forms:
     n==1 -> plural form 0
     otherwise -> plural form 1
   applies to:
     Germanic family
       Danish, Dutch, English, German, Norwegian, Swedish
     Finno-Ugric family
       Estonian, Finnish
     Latin/Greek family
       Greek
     Semitic family
       Hebrew
     Romanic family
       Italian, Portuguese, Spanish
     Artificial
       Esperanto

Rule 2
   two forms:
     n==0 || n==1 -> plural form 0
     otherwise -> plural form 1
   applies to:
     Romanic family
       French, Brazilian Portuguese

Rule 3
   three forms:
     n%10==1 && n%100!=11 -> plural form 0
     n!=0 -> plural form 1
     otherwise -> plural form 2
   applies to:
     Baltic family
       Latvian

Rule 4
   three forms:
     n==1 -> plural form 0
     n==2 -> plural form 1
     otherwise -> plural form 2
   applies to:
     Celtic
       Gaeilge

Rule 5
   three forms:
     n%10==1 && n%100!=11 -> plural form 0
     n%10>=2 && (n%100<10 || n%100>=20) -> plural form 1
     otherwise -> plural form 2
   applies to:
     Baltic family
       Lithuanian

Rule 6
   three forms:
     n%10==1 && n%100!=11 -> plural form 0
     n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) -> plural form 1
     otherwise -> plural form 2
   applies to:
     Slavic family
       Croatian, Czech, Russian, Slovak, Ukrainian

Rule 7
   three forms:
     n==1 -> plural form 0
     n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) -> plural form 1
     otherwise -> plural form 2
   applies to:
     Slavic family
       Polish

Rule 8
   four forms:
     n%100==1 -> plural form 0
     n%100==2 -> plural form 1
     n%100==3 || n%100==4 -> plural form 2
     otherwise -> plural form 3
   applies to:
     Slavic family
       Slovenian

> ----- Original Message -----
> From: "Theo Veenker" <[email protected]>
> To: "unicode" <[email protected]>
> Sent: Friday, July 08, 2005 07:49
> Subject: CLDR plural handling info?
>
>
>
>>Hi,
>>
>>In the CLDR 1.3 data there is no field describing which plural
>>handling method should be used when generating messages for the
>>given locale. Why?
>>
>>Is it because there are no user interface messages in the CLDR?
>>IMO it would be a good idea if the CLDR project in addition to
>>the LDML format would also provide a message catalog format.
>>But then of course plural handling data should be included in
>>the CLDR.
>>
>>Theo
>>

Next message: Erkki Kolehmainen: "Re: Arabic encoding model (alas, static!)"
Previous message: Asmus Freytag: "Re: Proofreading fonts"
In reply to: Mark Davis: "Re: CLDR plural handling info?"
Next in thread: Mark Davis: "Re: CLDR plural handling info?"
Reply: Mark Davis: "Re: CLDR plural handling info?"
Reply: Patrick Andries: "Re: CLDR plural handling info?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 03:07:06 CDT