From rick@unicode.org Thu Jan 10 11:23:49 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 11:27:40 -0600 (CST) Received: from izanami (c-71-202-247-55.hsd1.ca.comcast.net [71.202.247.55]) by unicode.org (8.12.11/8.12.11) with SMTP id m0AHNe44028363; Thu, 10 Jan 2008 11:23:40 -0600 Message-Id: <200801101723.m0AHNe44028363@unicode.org> To: unicode@unicode.org Subject: Unicode CLDR Release 1.5.1 now available Date: Thu, 10 Jan 2008 09:23:39 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 301 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: cldr-users The Unicode(R) Consortium has announced the release of the new version of the Unicode Common Locale Data Repository (Unicode CLDR 1.5.1), providing key building blocks for software to support the world's languages. Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; transliterating different alphabets; and many others. CLDR 1.5.1 is an update release, with no new translations. The main changes are a significant revision to the data and process for computing timezone names, and additional data for finding default script or country given a language, or the converse. The structure has also been updated for the latest version of BCP 47, and new currency codes. For more information, see http://unicode.org/cldr/ From asmodai@in-nomine.org Thu Jan 10 13:35:46 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 13:35:46 -0600 (CST) Received: from nexus.in-nomine.org (chronias-old.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0AJZjFS030839; Thu, 10 Jan 2008 13:35:46 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id 061A2C16C; Thu, 10 Jan 2008 20:35:45 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id weCgfB6a7c8z; Thu, 10 Jan 2008 20:35:44 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id 1161DC170; Thu, 10 Jan 2008 20:35:44 +0100 (CET) Date: Thu, 10 Jan 2008 20:35:44 +0100 From: Jeroen Ruigrok van der Werven To: Rick McGowan Cc: cldr-users@unicode.org Subject: Re: Unicode CLDR Release 1.5.1 now available Message-ID: <20080110193544.GZ75977@nexus.in-nomine.org> References: <200801101723.m0AHNe44028363@unicode.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200801101723.m0AHNe44028363@unicode.org> Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 302 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users Rick, -On [20080110 18:34], Rick McGowan (rick@unicode.org) wrote: >CLDR 1.5.1 is an update release, with no new translations. The main >changes are a significant revision to the data and process for computing >timezone names, and additional data for finding default script or country >given a language, or the converse. The structure has also been updated for >the latest version of BCP 47, and new currency codes. For more information, >see http://unicode.org/cldr/ http://unicode.org/cldr/version/1.5.1.html is Likely Subtags supposed to link to http://unicode.org/cldr/version/1.5.1.html#Likely_Subtags ? I think http://www.unicode.org/reports/tr35/tr35-9.html#Likely_Subtags was intended. -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ Seize from every moment its unique novelty and do not prepare your joys... From asmodai@in-nomine.org Thu Jan 10 13:38:52 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 13:38:52 -0600 (CST) Received: from nexus.in-nomine.org (chronias-old.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0AJcqZn031962; Thu, 10 Jan 2008 13:38:52 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id 543C1C1C0; Thu, 10 Jan 2008 20:38:51 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0WdyX2MUC5HO; Thu, 10 Jan 2008 20:38:50 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id BFA69C170; Thu, 10 Jan 2008 20:38:50 +0100 (CET) Date: Thu, 10 Jan 2008 20:38:50 +0100 From: Jeroen Ruigrok van der Werven To: Rick McGowan Cc: cldr-users@unicode.org Subject: Re: Unicode CLDR Release 1.5.1 now available Message-ID: <20080110193850.GA75977@nexus.in-nomine.org> References: <200801101723.m0AHNe44028363@unicode.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200801101723.m0AHNe44028363@unicode.org> Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 303 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users Rick, I also assume http://www.unicode.org/reports/tr35/ will be updated to reflect v9 is the latest version? Right now it is only v8 with no apparent link to v9 (aside from the CLDR 1.5.1 @ http://unicode.org/cldr/version/1.5.1.html). -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ Once sent from the Golden Hall... From mark.edward.davis@gmail.com Thu Jan 10 15:42:53 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 15:42:57 -0600 (CST) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.236]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0ALgqZj016443 for ; Thu, 10 Jan 2008 15:42:53 -0600 Received: by nz-out-0506.google.com with SMTP id x3so491442nzd.6 for ; Thu, 10 Jan 2008 13:42:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=qm5A2tPx+8e+1ORC9cQHMKo8CoTF5A/PW5aPcZwE710=; b=BGO33v+AcO8A0PX/B+dKGk7xVW4aw4fR8A/Voaz26rSEB8khAhVtwGUzFmeVMZ9Lxt+v5qpMgWvuM30dEieofNgNhQBQLqJ0s3KBDcLMsnFrb77aF40TfI43pwj9pXqWi1cvy/TeZFJBENeey2+w+g8mS+kDe+zcwixKIaUvpsg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=KxciFBylpV7+UdnvN4ZNSghGx7CPsaFhP/AjJcuvSZIe9CwTSJFfI3M+lnOQEnjYyR0X3SEBTLOI5doK7Ro4IE32uu84Bzg03ZzfYCqx7YAQrycxbsY+bv4An/bvkep+gJEHGO2uJiPUgYrNcnM9/ChQXsIxHneDqc5KHWJAS+w= Received: by 10.142.132.2 with SMTP id f2mr1365728wfd.221.1200001368665; Thu, 10 Jan 2008 13:42:48 -0800 (PST) Received: by 10.143.172.5 with HTTP; Thu, 10 Jan 2008 13:42:48 -0800 (PST) Message-ID: <30b660a20801101342g58fe8bb0hc852f73f4dbeae24@mail.gmail.com> Date: Thu, 10 Jan 2008 13:42:48 -0800 From: "Mark Davis" To: "Jeroen Ruigrok van der Werven" Subject: Re: Unicode CLDR Release 1.5.1 now available Cc: "Rick McGowan" , cldr-users@unicode.org In-Reply-To: <20080110193850.GA75977@nexus.in-nomine.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_9104_24245207.1200001368651" References: <200801101723.m0AHNe44028363@unicode.org> <20080110193850.GA75977@nexus.in-nomine.org> X-Google-Sender-Auth: 0ec987e508ad34bc X-archive-position: 304 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk X-list: cldr-users ------=_Part_9104_24245207.1200001368651 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: base64 Content-Disposition: inline VGhhbmtzIGZvciB0aGUgY29tbWVudHMgLS0gdGhvc2Ugd2VyZSBvdmVyc2lnaHRzLgoKTWFyawoK T24gSmFuIDEwLCAyMDA4IDExOjM4IEFNLCBKZXJvZW4gUnVpZ3JvayB2YW4gZGVyIFdlcnZlbiA8 CmFzbW9kYWlAaW4tbm9taW5lLm9yZz4gd3JvdGU6Cgo+IFJpY2ssCj4KPiBJIGFsc28gYXNzdW1l IGh0dHA6Ly93d3cudW5pY29kZS5vcmcvcmVwb3J0cy90cjM1LyB3aWxsIGJlIHVwZGF0ZWQgdG8K PiByZWZsZWN0Cj4gdjkgaXMgdGhlIGxhdGVzdCB2ZXJzaW9uPyBSaWdodCBub3cgaXQgaXMgb25s eSB2OCB3aXRoIG5vIGFwcGFyZW50IGxpbmsgdG8KPiB2OQo+IChhc2lkZSBmcm9tIHRoZSBDTERS IDEuNS4xIEAgaHR0cDovL3VuaWNvZGUub3JnL2NsZHIvdmVyc2lvbi8xLjUuMS5odG1sKS4KPgo+ IC0tCj4gSmVyb2VuIFJ1aWdyb2sgdmFuIGRlciBXZXJ2ZW4gPGFzbW9kYWkoLWF0LSlpbi1ub21p bmUub3JnPiAvIGFzbW9kYWkKPiDjgqTjgqfjg6vjg7zjg7Mg44Op44Km44OV44Ot44OD44KvIOOD tOOCoeODsyDjg4fjg6sg44Km44Kn44Or44O044Kn44OzCj4gaHR0cDovL3d3dy5pbi1ub21pbmUu b3JnLyB8IGh0dHA6Ly93d3cucmFuZ2FrdS5vcmcvCj4gT25jZSBzZW50IGZyb20gdGhlIEdvbGRl biBIYWxsLi4uCj4KPgoKCi0tIApNYXJrCg== ------=_Part_9104_24245207.1200001368651 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: base64 Content-Disposition: inline VGhhbmtzIGZvciB0aGUgY29tbWVudHMgLS0gdGhvc2Ugd2VyZSBvdmVyc2lnaHRzLjxicj48YnI+ TWFyazxicj48YnI+PGRpdiBjbGFzcz0iZ21haWxfcXVvdGUiPk9uIEphbiAxMCwgMjAwOCAxMToz OCBBTSwgSmVyb2VuIFJ1aWdyb2sgdmFuIGRlciBXZXJ2ZW4gJmx0OzxhIGhyZWY9Im1haWx0bzph c21vZGFpQGluLW5vbWluZS5vcmciPmFzbW9kYWlAaW4tbm9taW5lLm9yZzwvYT4mZ3Q7IHdyb3Rl Ogo8YnI+PGJsb2NrcXVvdGUgY2xhc3M9ImdtYWlsX3F1b3RlIiBzdHlsZT0iYm9yZGVyLWxlZnQ6 IDFweCBzb2xpZCByZ2IoMjA0LCAyMDQsIDIwNCk7IG1hcmdpbjogMHB0IDBwdCAwcHQgMC44ZXg7 IHBhZGRpbmctbGVmdDogMWV4OyI+Umljayw8YnI+PGJyPkkgYWxzbyBhc3N1bWUgPGEgaHJlZj0i aHR0cDovL3d3dy51bmljb2RlLm9yZy9yZXBvcnRzL3RyMzUvIiB0YXJnZXQ9Il9ibGFuayI+Cmh0 dHA6Ly93d3cudW5pY29kZS5vcmcvcmVwb3J0cy90cjM1LzwvYT4gd2lsbCBiZSB1cGRhdGVkIHRv IHJlZmxlY3Q8YnI+djkgaXMgdGhlIGxhdGVzdCB2ZXJzaW9uPyBSaWdodCBub3cgaXQgaXMgb25s eSB2OCB3aXRoIG5vIGFwcGFyZW50IGxpbmsgdG8gdjk8YnI+KGFzaWRlIGZyb20gdGhlIENMRFIg MS41LjEgQCA8YSBocmVmPSJodHRwOi8vdW5pY29kZS5vcmcvY2xkci92ZXJzaW9uLzEuNS4xLmh0 bWwiIHRhcmdldD0iX2JsYW5rIj4KaHR0cDovL3VuaWNvZGUub3JnL2NsZHIvdmVyc2lvbi8xLjUu MS5odG1sPC9hPikuPGJyPjxkaXYgY2xhc3M9IkloMkUzZCI+PGJyPi0tPGJyPkplcm9lbiBSdWln cm9rIHZhbiBkZXIgV2VydmVuICZsdDthc21vZGFpKC1hdC0paW4tPGEgaHJlZj0iaHR0cDovL25v bWluZS5vcmciIHRhcmdldD0iX2JsYW5rIj5ub21pbmUub3JnPC9hPiZndDsgLyBhc21vZGFpPGJy PuOCpOOCp+ODq+ODvOODsyDjg6njgqbjg5Xjg63jg4Pjgq8g44O044Kh44OzIOODh+ODqyDjgqbj gqfjg6vjg7Tjgqfjg7MKPGJyPjxhIGhyZWY9Imh0dHA6Ly93d3cuaW4tbm9taW5lLm9yZy8iIHRh cmdldD0iX2JsYW5rIj5odHRwOi8vd3d3LmluLW5vbWluZS5vcmcvPC9hPiB8IDxhIGhyZWY9Imh0 dHA6Ly93d3cucmFuZ2FrdS5vcmcvIiB0YXJnZXQ9Il9ibGFuayI+aHR0cDovL3d3dy5yYW5nYWt1 Lm9yZy88L2E+PGJyPjwvZGl2Pk9uY2Ugc2VudCBmcm9tIHRoZSBHb2xkZW4gSGFsbC4uLjxicj48 YnI+PC9ibG9ja3F1b3RlPgo8L2Rpdj48YnI+PGJyIGNsZWFyPSJhbGwiPjxicj4tLSA8YnI+TWFy awo= ------=_Part_9104_24245207.1200001368651-- From verdy_p@wanadoo.fr Thu Jan 10 17:58:12 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 17:58:12 -0600 (CST) Received: from smtp25.orange.fr (smtp25.orange.fr [193.252.22.23]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0ANwBn7018604 for ; Thu, 10 Jan 2008 17:58:12 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2513.orange.fr (SMTP Server) with ESMTP id 3ED971C00087 for ; Fri, 11 Jan 2008 00:58:06 +0100 (CET) Received: from HARNON (APoitiers-258-1-135-180.w90-50.abo.wanadoo.fr [90.50.230.180]) by mwinf2513.orange.fr (SMTP Server) with ESMTP id DB5F91C00084; Fri, 11 Jan 2008 00:58:05 +0100 (CET) X-ME-UUID: 20080110235805899.DB5F91C00084@mwinf2513.orange.fr Reply-To: From: "Philippe Verdy" To: "'Rick McGowan'" Cc: References: <200801101723.m0AHNe44028363@unicode.org> <20080110193544.GZ75977@nexus.in-nomine.org> Subject: Beta Survey application bug: coverage Date: Fri, 11 Jan 2008 00:59:59 +0100 Organization: Ordinateur Personnel Message-ID: <006401c853e4$e8bbf4a0$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20080110193544.GZ75977@nexus.in-nomine.org> Thread-Index: AchTwmrDnDJ2lj5wRwyicgOasDA/EQAIiTcw X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-archive-position: 305 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users Something to fix for the coming new beta of the survey (starting on February 1st?) http://unicode.org/cldr/apps/survey?_=fr Possible problems with locale: (null) : Error: Internal error in org.unicode.cldr.test.CheckCoverage. Exception: java.lang.NullPointerException, Message: java.lang.NullPointerException, Trace: [] From verdy_p@wanadoo.fr Thu Jan 10 18:11:44 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 18:11:44 -0600 (CST) Received: from smtp25.orange.fr (smtp25.orange.fr [193.252.22.23]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0B0Bhi8021522 for ; Thu, 10 Jan 2008 18:11:44 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2513.orange.fr (SMTP Server) with ESMTP id 439571C0008A for ; Fri, 11 Jan 2008 01:11:38 +0100 (CET) Received: from HARNON (APoitiers-258-1-135-180.w90-50.abo.wanadoo.fr [90.50.230.180]) by mwinf2513.orange.fr (SMTP Server) with ESMTP id 00A311C00086; Fri, 11 Jan 2008 01:11:37 +0100 (CET) X-ME-UUID: 20080111001138272.00A311C00086@mwinf2513.orange.fr Reply-To: From: "Philippe Verdy" To: "'Rick McGowan'" Cc: References: <200801101723.m0AHNe44028363@unicode.org> <20080110193544.GZ75977@nexus.in-nomine.org> Subject: 1.5.1 change: bug in survey (currency: ROL/symbol) Date: Fri, 11 Jan 2008 01:13:31 +0100 Organization: Ordinateur Personnel Message-ID: <006501c853e6$ccc2c880$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: Thread-Index: AchTwmrDnDJ2lj5wRwyicgOasDA/EQAIiTcwAAApYKA= X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-archive-position: 306 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users Bug related to the change announed in CLDR 1.5.1 release: http://unicode.org/cldr/apps/survey?_=fr&x=currencies Displays "internal error" for (unconfirmed) ROL/symbol in example (root data is "ROL", but English data "=0#Old lei|1#Old leu|1" fails.) ---- Consider also disambiguating "Guyana" in English locale for timezones, as this is not an examplar city, and the name is easily confusable with French Guiana. Proposed solution: use "Georgetown" as the examplar city, and add "(Guyana)" country name in suffix, because the city is also easily confusable with other cities of the Caribbean region). Note however that a country suffix is only added when there are multiple timezones in a country, but not when the same examplar city name is a capital of another country/region with its own timezone where the same city name would be examplar. There may be some languages where there's a minor orthographic difference, but such difference is easily confusable like in "Guyana" vs. "Guiana" in English, or "Guyana" vs. "Guyane" in French, or "Georgetown" vs. "Georgestown"... What best solution could be used to avoid such confusions? From verdy_p@wanadoo.fr Thu Jan 10 23:24:52 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 10 Jan 2008 23:24:52 -0600 (CST) Received: from smtp25.orange.fr (smtp25.orange.fr [193.252.22.23]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0B5OprG020983 for ; Thu, 10 Jan 2008 23:24:51 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2553.orange.fr (SMTP Server) with ESMTP id 85C2A1C00087 for ; Fri, 11 Jan 2008 06:24:45 +0100 (CET) Received: from HARNON (APoitiers-258-1-135-180.w90-50.abo.wanadoo.fr [90.50.230.180]) by mwinf2553.orange.fr (SMTP Server) with ESMTP id 11C3C1C00082; Fri, 11 Jan 2008 06:24:45 +0100 (CET) X-ME-UUID: 20080111052445728.11C3C1C00082@mwinf2553.orange.fr Reply-To: From: "Philippe Verdy" To: "'Jeroen Ruigrok van der Werven'" , "'Rick McGowan'" Cc: References: <200801101723.m0AHNe44028363@unicode.org> <20080110193544.GZ75977@nexus.in-nomine.org> Subject: RE: Unicode CLDR Release 1.5.1 now available Date: Fri, 11 Jan 2008 06:26:38 +0100 Organization: Ordinateur Personnel Message-ID: <006f01c85412$8aa14b30$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20080110193544.GZ75977@nexus.in-nomine.org> Thread-Index: AchTwmrDnDJ2lj5wRwyicgOasDA/EQAS9L7A X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-archive-position: 307 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users On [20080110 18:34], Rick McGowan (rick@unicode.org) wrote: >CLDR 1.5.1 is an update release, with no new translations. The main >changes are a significant revision to the data and process for computing >timezone names Still defecting in the current data, because now it focuses on examplar cities instead of countries as the primary selective name, and does not disambiguate this city completely with the country name when needed, but only when a country has several timezones, despite each examplar cities are not necessarily ambiguous in the exising timezone data. For example "Georgetown" is the capital and examplar city of several countries that don't have multiple timezones, so the city name is not qualified with the country name. This caveat comes exactly from the fact that the selective name has been reversed in CLDR 1.5 (and this is still not corrected in 1.5.1). > and additional data for finding default script or country > given a language, or the converse. There's still a problem for the converse. Try applying it for "zh-SG", you'll get: * the maximized locale id as "zh-Hans-SG" (keeping possible existing variants on a separate variable that will get appended at end of the converse resolution). * looking for "zh-Hans" will return nothing * looking for "zh" will return "zh-Hans-CN", and substituting "SG" for "CN" will return "zh_Hans_SG". But if the latest step had returned "zh-Hant-CN" (i.e. China used the traditional orthography by default), the conversion would have returned the wrong default orthography for Singapore: "zh-Hant-SG". The problem may happen if the same language uses a different default script in one country from the default script used in another. I am thinking here about the orthography of Serbian which is now Latin by default in Serbia, and still Cyrillic in Bosnia-Herzegovina. From "sr" you would most probably expect "sr-Latn-SR" by default, and the same would be obtained from "sr-SR". From "sr-Cyrl", you would get certainly "sr-Cyrl-SR" by default, implying that Serbian is most probably for Serbia where it is mostly used. But from "sr-BA" you would expect to find "sr-Cyrl-BA", not "sr-Latn-BA", for political reasons where "sr-Latn" would be too much confusable with "hr-Latn" (or "bs-Latn", the local version less politically oriented, but a "Bosnian" language is still rejected by Serbians in the Serbian autonomous region in Northern-Eastern Bosnia, that still refer to their language as "Serbian", and that want to maintain the cyrilic script as a strong cultural difference from Bosnian). These defaults may easily change again or could be disputed: what is the preferred script now in Montenegro? And for Serbians in Kosovo? This area (Bosnia, Serbia, Montenegro, Kosovo) is still considered as using two conflicting scripts, and little can be arbitrarily chosen due to ethnic and political preferences, even if the war is now over (I think this also applies, however with less critical issues, in the FYRO Macedonia, where there's also an active ethnic Albanian community that prefers Latin, or may sometime still use Arabic for religious purposes). When there's such a mosaic of ethnic peoples in a small area, this conflict will often translate into their language, notably if there are multiple scripts and languages are easily mixed. I'm not sure that the situation in Central Africa or India is even simpler with all their many languages. From eik@iki.fi Fri Jan 11 02:26:10 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 11 Jan 2008 02:26:10 -0600 (CST) Received: from smtp5.pp.htv.fi (smtp5.pp.htv.fi [213.243.153.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0B8Q9H6016751 for ; Fri, 11 Jan 2008 02:26:10 -0600 Received: from Raahattava (cs181253188.pp.htv.fi [82.181.253.188]) by smtp5.pp.htv.fi (Postfix) with ESMTP id 0805D5BC02A for ; Fri, 11 Jan 2008 10:26:09 +0200 (EET) From: "Erkki I. Kolehmainen" To: Subject: Open Workshop on Multilingual Extensions to Current Latin Keyboards Date: Fri, 11 Jan 2008 10:26:04 +0200 Message-ID: <000501c8542b$9b6c6bc0$0200a8c0@Raahattava> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id m0B8Q9H6016751 X-archive-position: 308 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: eik@iki.fi Precedence: bulk X-list: cldr-users Dear All, An open workshop to produce a document on considerations and possible guidance on how to expand the existing keyboard layouts to support multilingualism will be kicked-off on 25 January 2008 at CEN, the European standards organization. The urgency of such support stems from the practical and legal requirements that result from the free movement of people and goods within the European Union, but the findings will in no way be limited to the EU with its many official, regional and minority languages. The participation is also free to any interested party anywhere in the world, although the workshop will be organized under the auspices of CEN. The workshop that has been fostered by CEN/ISSS CDFG (the Cultural Diversity Focus Group of the CEN Information Society Standardization System, a Unicode Liaison member) has a highly pragmatic approach: The users of the existing keyboards should be able to continue their current way of operation with the extended layouts, and the new functionality should be intuitively recognizable to the extent possible. Thus, the workshop has no intention to define any specific, let alone a Pan-European keyboard layout. Liaison is sought with e.g., ISO/IEC JTC1/SC35 to minimize the risk of further divergence between actual implementations and formal standards (ISO/IEC 9995 series). Information on this workshop (WS/MEEK), including the draft business plan and how to participate (with or without physical presence) is available via the CEN main page at http://www.cen.eu. If interested, please join! Sincerely, Erkki I. Kolehmainen Tilkankatu 12 A 3, FI-00300 Helsinki, Finland Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 P.S. I apologise for those of you who will receive multiple copies of this announcement, since I'll distribute it to the Unicde, Unicore, CLDR, and CLDR-Users lists. From duerst@it.aoyama.ac.jp Sat Jan 12 01:32:57 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Sat, 12 Jan 2008 01:32:57 -0600 (CST) Received: from scmailgw2.scop.aoyama.ac.jp (scmailgw2.scop.aoyama.ac.jp [133.2.251.195]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0C7WtXU022260 for ; Sat, 12 Jan 2008 01:32:57 -0600 Received: from scmse1.scbb.aoyama.ac.jp (scmse1 [133.2.253.16]) by scmailgw2.scop.aoyama.ac.jp (secret/secret) with SMTP id m0C7Whdo022681 for ; Sat, 12 Jan 2008 16:32:44 +0900 (JST) Received: from (133.2.206.133) by scmse1.scbb.aoyama.ac.jp via smtp id 7f5b_903aee58_c0e0_11dc_8879_0014221fa3c9; Sat, 12 Jan 2008 16:32:43 +0900 Received: from Tanzawa.it.aoyama.ac.jp ([133.2.210.1]:39745) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id for from ; Sat, 12 Jan 2008 16:28:32 +0900 Message-Id: <6.0.0.20.2.20080112161239.075fd430@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Version 6J Date: Sat, 12 Jan 2008 16:14:09 +0900 To: "Erkki I. Kolehmainen" , From: Martin Duerst Subject: Re: Open Workshop on Multilingual Extensions to Current Latin Keyboards In-Reply-To: <000501c8542b$9b6c6bc0$0200a8c0@Raahattava> References: <000501c8542b$9b6c6bc0$0200a8c0@Raahattava> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-archive-position: 309 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: duerst@it.aoyama.ac.jp Precedence: bulk X-list: cldr-users Hello Erkki, Unfortunately, I'm very sure I won't be able to participate, but it would be great if this workshop also would consider solutions not only for (in a wide sense) QUERTY-based keyboards, but also for alternative layouts such as Dvorak. Regards, Martin. At 17:26 08/01/11, Erkki I. Kolehmainen wrote: >Dear All, > >An open workshop to produce a document on considerations and possible >guidance on how to expand the existing keyboard layouts to support >multilingualism will be kicked-off on 25 January 2008 at CEN, the European >standards organization. The urgency of such support stems from the >practical and legal requirements that result from the free movement of >people and goods within the European Union, but the findings will in no way >be limited to the EU with its many official, regional and minority >languages. The participation is also free to any interested party anywhere >in the world, although the workshop will be organized under the auspices of CEN. > >The workshop that has been fostered by CEN/ISSS CDFG (the Cultural >Diversity Focus Group of the CEN Information Society Standardization >System, a Unicode Liaison member) has a highly pragmatic approach: The >users of the existing keyboards should be able to continue their current >way of operation with the extended layouts, and the new functionality >should be intuitively recognizable to the extent possible. Thus, the >workshop has no intention to define any specific, let alone a Pan-European >keyboard layout. Liaison is sought with e.g., ISO/IEC JTC1/SC35 to minimize >the risk of further divergence between actual implementations and formal >standards (ISO/IEC 9995 series). > >Information on this workshop (WS/MEEK), including the draft business plan >and how to participate (with or without physical presence) is available via >the CEN main page at http://www.cen.eu. > >If interested, please join! > >Sincerely, > >Erkki I. Kolehmainen >Tilkankatu 12 A 3, FI-00300 Helsinki, Finland >Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 > >P.S. I apologise for those of you who will receive multiple copies of this >announcement, since I'll distribute it to the Unicde, Unicore, CLDR, and >CLDR-Users lists. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp From verdy_p@wanadoo.fr Sat Jan 12 02:46:53 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Sat, 12 Jan 2008 02:46:54 -0600 (CST) Received: from smtp25.orange.fr (smtp25.orange.fr [193.252.22.24]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0C8kqIg011927 for ; Sat, 12 Jan 2008 02:46:53 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2546.orange.fr (SMTP Server) with ESMTP id A2BCC1C0008A for ; Sat, 12 Jan 2008 09:46:46 +0100 (CET) Received: from HARNON (APoitiers-258-1-135-180.w90-50.abo.wanadoo.fr [90.50.230.180]) by mwinf2546.orange.fr (SMTP Server) with ESMTP id 131FB1C00086; Sat, 12 Jan 2008 09:46:46 +0100 (CET) X-ME-UUID: 20080112084646784.131FB1C00086@mwinf2546.orange.fr Reply-To: From: "Philippe Verdy" To: "'Martin Duerst'" , "'Erkki I. Kolehmainen'" , References: <000501c8542b$9b6c6bc0$0200a8c0@Raahattava> <6.0.0.20.2.20080112161239.075fd430@localhost> Subject: RE: Open Workshop on Multilingual Extensions to Current Latin Keyboards Date: Sat, 12 Jan 2008 09:46:44 +0100 Organization: Ordinateur Personnel Message-ID: <00c401c854f7$a97c4c90$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <6.0.0.20.2.20080112161239.075fd430@localhost> Thread-Index: AchU773mEpFzHNjiT/CAGaBFVeYKUwABchiQ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-archive-position: 310 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users Martin Duerst wrote: > Hello Erkki, > > Unfortunately, I'm very sure I won't be able to participate, > but it would be great if this workshop also would consider > solutions not only for (in a wide sense) QUERTY-based keyboards, > but also for alternative layouts such as Dvorak. Apparently, the CEN workshop is announced and says that it will not focus on developing new standard layouts (probably leaving that to each national standard body). But it wants proposals for guidelines about how to extend the existing features of keyboards, and probably demonstrations about new input methods for composing texts in multilingual environments, in a way that is both easy for users to learn and understand and not too costly to deploy massively to the general public. So what is expected is probably: * Guidelines for supporting the correct composition of the various languages in Europe * Guidelines and recommendations for software developers and apparatus providers * Solutions for disabled people * Hardware addons and helpers * Possibility of composing rare letters on directly with the finger on touchpads, touchscreens, or similar sensitive devices * Dictionary-based helpers and/or correctors * Common GUI features for applications, both for usability and correctness of text input. * Methods for properly handling user input preferences, regarding their preferred input methods, with interoperability with existing software and good support for this integration by the OS ... Extending the existing standard layouts is not the only thing expected, but some recommandations may be given about the placement of additional characters to share between various layouts (remember the discussions that occurred before the launch of the Euro in 1999). Also, PC manufacturers should listen about these recommandations, so that any PC becomes usable independently of the language for which it was initially localized. This mostly concern notebooks and handhelp devices, that are rarely adaptable after purchase, or would benefit from better multilingual capabilities (for example at work or in public places on shared environments where you cannot presume which language will be preferred by users. It could also facilitate the commerce, by improving the competition between providers of various countries that use distinct national layouts for the keyboards. Some recommandations should also be given so that consumer get a real choice about their keyboard layout, notably for notebooks, where a keyboard is not easily replaceable and is rarely an option: why not having an additional row of keys that are freely configurable by user preferences as a standard feature, and not as an option? From naz@mira.net Tue Jan 15 05:14:36 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 15 Jan 2008 10:47:57 -0600 (CST) Received: from outbound.icp-qv1-irony-out3.iinet.net.au (outbound.icp-qv1-irony-out3.iinet.net.au [203.59.1.148]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0FBEYjT003575 for ; Tue, 15 Jan 2008 05:14:36 -0600 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAAskjEd8qEKz/2dsb2JhbACrBw X-IronPort-AV: E=Sophos;i="4.24,286,1196607600"; d="scan'208";a="214311559" Received: from unknown (HELO [192.168.0.6]) ([124.168.66.179]) by outbound.icp-qv1-irony-out3.iinet.net.au with ESMTP; 15 Jan 2008 20:14:32 +0900 Message-ID: <478C9593.9010806@mira.net> Date: Tue, 15 Jan 2008 22:14:27 +1100 From: Naz Gassiep User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: cldr-users@unicode.org Subject: Data set relevance Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 311 X-Approved-By: root@unicode.org X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: naz@mira.net Precedence: bulk X-list: cldr-users I am a relatively new user to the CLDR datasets. I was wondering if someone can explain to me how to determine which datafiles are relevant to a given locale selection. For example, if I would like to select Chinese, in Hong Kong for the Traditional Han script, which files would I use? Do I only use zh_Hant_HK.xml or do I have to merge in zh.xml with that file? Also, if a language is written in multiple scripts, then surely each script/language combination constitutes a whole locale, distinct from other scripts. I will use Serbian as an example here: If a user wants to get the locale relating to Serbian in Montenegro using the Latin script, how do I know if the file sr_ME has any relevance? How do I know if the data in it pertains to the Latin or Cyrillic scripts? Can I assume that data in the sr_ME is relevant to *both* script variants of Serbian? Some pointers on this would be great. Thanks in advance, - Naz. From mark.edward.davis@gmail.com Tue Jan 15 12:01:47 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 15 Jan 2008 12:01:48 -0600 (CST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.229]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0FI1lY6024654 for ; Tue, 15 Jan 2008 12:01:47 -0600 Received: by wr-out-0506.google.com with SMTP id 69so935091wri.15 for ; Tue, 15 Jan 2008 10:01:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=8eWqwan0xPvDgL0kwp7rFTaGYKx9+gnWqWDJMYfIoxo=; b=W03WwOcWnFdJIWDeL6W+lkBsRhzD1XB2ki1WRCT+P4ZNNHEJrSGEMBX7tUdpn2BHufTXwMjmnrCKReYtW43tyBaKn4/qT3JFksQL0VsIh6RlACucOgmM3Wja6Cq10Kg7flDRq9/p1GKW7NE01fi63EM7Vm7flby7Cx08AY783VY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=AtN1ojAH2abmC2UMr40mwGI3gdaQHfq2fbcma/j6DaRYTHZ3s5DpXunLiE3uS5Su/jEyIcHmaX9e8f5mVZr7w/phdq2zgddyUoP3VJv6dRR4j84fA9wbczSLbhSUbh4OrLzadxZ6Avpu+2LvSB6BYSvGrzDYpQ61h5rd50kMTt4= Received: by 10.142.90.8 with SMTP id n8mr3299524wfb.84.1200420101197; Tue, 15 Jan 2008 10:01:41 -0800 (PST) Received: by 10.143.196.9 with HTTP; Tue, 15 Jan 2008 10:01:40 -0800 (PST) Message-ID: <30b660a20801151001m3b4d5474i35d429e68eb657d5@mail.gmail.com> Date: Tue, 15 Jan 2008 10:01:40 -0800 From: "Mark Davis" To: verdy_p@wanadoo.fr Subject: Re: Unicode CLDR Release 1.5.1 now available Cc: "Jeroen Ruigrok van der Werven" , "Rick McGowan" , cldr-users@unicode.org In-Reply-To: <006f01c85412$8aa14b30$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2798_28359291.1200420101195" References: <200801101723.m0AHNe44028363@unicode.org> <20080110193544.GZ75977@nexus.in-nomine.org> <006f01c85412$8aa14b30$0a01a8c0@HARNON> X-Google-Sender-Auth: a54275981900e402 X-archive-position: 312 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk X-list: cldr-users ------=_Part_2798_28359291.1200420101195 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline If you think these are bugs, you should file them as such. (I don't think they are, but if they are filed as bugs we'll look into them). Mark On Jan 10, 2008 9:26 PM, Philippe Verdy wrote: > On [20080110 18:34], Rick McGowan (rick@unicode.org) wrote: > >CLDR 1.5.1 is an update release, with no new translations. The main > >changes are a significant revision to the data and process for computing > >timezone names > > Still defecting in the current data, because now it focuses on examplar > cities instead of countries as the primary selective name, and does not > disambiguate this city completely with the country name when needed, but > only when a country has several timezones, despite each examplar cities > are > not necessarily ambiguous in the exising timezone data. For example > "Georgetown" is the capital and examplar city of several countries that > don't have multiple timezones, so the city name is not qualified with the > country name. This caveat comes exactly from the fact that the selective > name has been reversed in CLDR 1.5 (and this is still not corrected in > 1.5.1). > > > and additional data for finding default script or country > > given a language, or the converse. > > There's still a problem for the converse. > > Try applying it for "zh-SG", you'll get: > * the maximized locale id as "zh-Hans-SG" (keeping possible existing > variants on a separate variable that will get appended at end of the > converse resolution). > * looking for "zh-Hans" will return nothing > * looking for "zh" will return "zh-Hans-CN", and substituting "SG" for > "CN" > will return "zh_Hans_SG". > > But if the latest step had returned "zh-Hant-CN" (i.e. China used the > traditional orthography by default), the conversion would have returned > the > wrong default orthography for Singapore: "zh-Hant-SG". > > The problem may happen if the same language uses a different default > script > in one country from the default script used in another. I am thinking here > about the orthography of Serbian which is now Latin by default in Serbia, > and still Cyrillic in Bosnia-Herzegovina. > > From "sr" you would most probably expect "sr-Latn-SR" by default, and the > same would be obtained from "sr-SR". > From "sr-Cyrl", you would get certainly "sr-Cyrl-SR" by default, implying > that Serbian is most probably for Serbia where it is mostly used. > But from "sr-BA" you would expect to find "sr-Cyrl-BA", not "sr-Latn-BA", > for political reasons where "sr-Latn" would be too much confusable with > "hr-Latn" (or "bs-Latn", the local version less politically oriented, but > a > "Bosnian" language is still rejected by Serbians in the Serbian autonomous > region in Northern-Eastern Bosnia, that still refer to their language as > "Serbian", and that want to maintain the cyrilic script as a strong > cultural > difference from Bosnian). These defaults may easily change again or could > be > disputed: what is the preferred script now in Montenegro? And for Serbians > in Kosovo? This area (Bosnia, Serbia, Montenegro, Kosovo) is still > considered as using two conflicting scripts, and little can be arbitrarily > chosen due to ethnic and political preferences, even if the war is now > over > (I think this also applies, however with less critical issues, in the FYRO > Macedonia, where there's also an active ethnic Albanian community that > prefers Latin, or may sometime still use Arabic for religious purposes). > > When there's such a mosaic of ethnic peoples in a small area, this > conflict > will often translate into their language, notably if there are multiple > scripts and languages are easily mixed. I'm not sure that the situation in > Central Africa or India is even simpler with all their many languages. > > > > -- Mark ------=_Part_2798_28359291.1200420101195 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline If you think these are bugs, you should file them as such. (I don't think they are, but if they are filed as bugs we'll look into them).

Mark

On Jan 10, 2008 9:26 PM, Philippe Verdy < verdy_p@wanadoo.fr> wrote:
On [20080110 18:34], Rick McGowan (rick@unicode.org) wrote:
>CLDR 1.5.1 is an update release, with no new translations. The main
>changes are a significant revision to the data and process for computing
>timezone names

Still defecting in the current data, because now it focuses on examplar
cities instead of countries as the primary selective name, and does not
disambiguate this city completely with the country name when needed, but
only when a country has several timezones, despite each examplar cities are
not necessarily ambiguous in the exising timezone data. For example
"Georgetown" is the capital and examplar city of several countries that
don't have multiple timezones, so the city name is not qualified with the
country name. This caveat comes exactly from the fact that the selective
name has been reversed in CLDR 1.5 (and this is still not corrected in
1.5.1).

> and additional data for finding default script or country
> given a language, or the converse.

There's still a problem for the converse.

Try applying it for "zh-SG", you'll get:
* the maximized locale id as "zh-Hans-SG" (keeping possible existing
variants on a separate variable that will get appended at end of the
converse resolution).
* looking for "zh-Hans" will return nothing
* looking for "zh" will return "zh-Hans-CN", and substituting "SG" for "CN"
will return "zh_Hans_SG".

But if the latest step had returned "zh-Hant-CN" ( i.e. China used the
traditional orthography by default), the conversion would have returned the
wrong default orthography for Singapore: "zh-Hant-SG".

The problem may happen if the same language uses a different default script
in one country from the default script used in another. I am thinking here
about the orthography of Serbian which is now Latin by default in Serbia,
and still Cyrillic in Bosnia-Herzegovina.

From "sr" you would most probably expect "sr-Latn-SR" by default, and the
same would be obtained from "sr-SR".
From "sr-Cyrl", you would get certainly "sr-Cyrl-SR" by default, implying
that Serbian is most probably for Serbia where it is mostly used.
But from "sr-BA" you would expect to find "sr-Cyrl-BA", not "sr-Latn-BA",
for political reasons where "sr-Latn" would be too much confusable with
"hr-Latn" (or "bs-Latn", the local version less politically oriented, but a
"Bosnian" language is still rejected by Serbians in the Serbian autonomous
region in Northern-Eastern Bosnia, that still refer to their language as
"Serbian", and that want to maintain the cyrilic script as a strong cultural
difference from Bosnian). These defaults may easily change again or could be
disputed: what is the preferred script now in Montenegro? And for Serbians
in Kosovo? This area (Bosnia, Serbia, Montenegro, Kosovo) is still
considered as using two conflicting scripts, and little can be arbitrarily
chosen due to ethnic and political preferences, even if the war is now over
(I think this also applies, however with less critical issues, in the FYRO
Macedonia, where there's also an active ethnic Albanian community that
prefers Latin, or may sometime still use Arabic for religious purposes).

When there's such a mosaic of ethnic peoples in a small area, this conflict
will often translate into their language, notably if there are multiple
scripts and languages are easily mixed. I'm not sure that the situation in
Central Africa or India is even simpler with all their many languages.






--
Mark ------=_Part_2798_28359291.1200420101195-- From mark.edward.davis@gmail.com Tue Jan 15 12:02:41 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 15 Jan 2008 12:02:41 -0600 (CST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.228]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0FI2f78024930 for ; Tue, 15 Jan 2008 12:02:41 -0600 Received: by wr-out-0506.google.com with SMTP id 69so935548wri.15 for ; Tue, 15 Jan 2008 10:02:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=s38xtuKE7uuo3XhmZIBcRVggUq2VIycRCAlv2CXR4Rw=; b=GPkQcL2cXrY4Izg86eAVXdK9LOgCZyc4qBGMO9Lb2j5vYR8q6p5EPnfSSbpOxJjQIJQF2XHYo5HGTmyX5rdqqcNjBEjjCsErniZKSSAn9jgTYiYKfsHwUqKuXZ1IPldtzkiAGiTeOUiuYCaHEFgD2D9TOwVnl9EJe0nz7WucwYM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=R83e8KUuaj2u3dTsioGKndKk/JIQsTkf038rOqMDBksUaqVU2z+NI88T0Rp7ecq4BecIyZy1zc5dxeETrfeVU8TxXAomoUExSF0/JPu9SfjeC9hlsnydflLSyv1bdTOb+JWwm9Dn/FtraNVGr+bG2RCOkAfIsCr206a/1Gvmaws= Received: by 10.142.97.20 with SMTP id u20mr3302674wfb.203.1200420155273; Tue, 15 Jan 2008 10:02:35 -0800 (PST) Received: by 10.143.196.9 with HTTP; Tue, 15 Jan 2008 10:02:35 -0800 (PST) Message-ID: <30b660a20801151002l1a771715ya36732e455758e4e@mail.gmail.com> Date: Tue, 15 Jan 2008 10:02:35 -0800 From: "Mark Davis" To: verdy_p@wanadoo.fr Subject: Re: Beta Survey application bug: coverage Cc: "Rick McGowan" , cldr-users@unicode.org In-Reply-To: <006401c853e4$e8bbf4a0$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2807_14108905.1200420155301" References: <200801101723.m0AHNe44028363@unicode.org> <20080110193544.GZ75977@nexus.in-nomine.org> <006401c853e4$e8bbf4a0$0a01a8c0@HARNON> X-Google-Sender-Auth: 77aa8f1f25d6468d X-archive-position: 313 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk X-list: cldr-users ------=_Part_2807_14108905.1200420155301 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Thanks. Can you file this as a bug? Mark On Jan 10, 2008 3:59 PM, Philippe Verdy wrote: > Something to fix for the coming new beta of the survey (starting on > February > 1st?) > > http://unicode.org/cldr/apps/survey?_=fr > > Possible problems with locale: > (null) : Error: Internal error in org.unicode.cldr.test.CheckCoverage. > Exception: java.lang.NullPointerException, Message: > java.lang.NullPointerException, Trace: [] > > > > > -- Mark ------=_Part_2807_14108905.1200420155301 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Thanks. Can you file this as a bug?

Mark

On Jan 10, 2008 3:59 PM, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
Something to fix for the coming new beta of the survey (starting on February
1st?)

http://unicode.org/cldr/apps/survey?_=fr

Possible problems with locale:
(null) : Error: Internal error in org.unicode.cldr.test.CheckCoverage.
Exception: java.lang.NullPointerException, Message:
java.lang.NullPointerException, Trace: []







--
Mark ------=_Part_2807_14108905.1200420155301-- From verdy_p@wanadoo.fr Tue Jan 15 16:09:58 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 15 Jan 2008 16:09:58 -0600 (CST) Received: from smtp25.orange.fr (smtp25.orange.fr [193.252.22.23]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0FM9vB0031912 for ; Tue, 15 Jan 2008 16:09:58 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2557.orange.fr (SMTP Server) with ESMTP id C9A591C0008D for ; Tue, 15 Jan 2008 23:09:51 +0100 (CET) Received: from HARNON (APoitiers-258-1-79-184.w90-45.abo.wanadoo.fr [90.45.238.184]) by mwinf2557.orange.fr (SMTP Server) with ESMTP id 712FB1C00086; Tue, 15 Jan 2008 23:09:51 +0100 (CET) X-ME-UUID: 20080115220951463.712FB1C00086@mwinf2557.orange.fr Reply-To: From: "Philippe Verdy" To: "'Naz Gassiep'" , References: <478C9593.9010806@mira.net> Subject: RE: Data set relevance Date: Tue, 15 Jan 2008 23:09:46 +0100 Organization: Ordinateur Personnel Message-ID: <017801c857c3$56d6b510$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <478C9593.9010806@mira.net> Thread-Index: AchXm5+lcfdKS3UeQQq4WrsXJXmsugAJncEg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id m0FM9vB0031912 X-archive-position: 314 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users No merge is necessary to use the CLDR database, however the lookup will possibly require multiple files, if a resource can't be found in the current file. The resolution algorithm will possibly look first in zh_Hant_HK.xml, then zh_Hant.xml, then zh_HK.xml, then zh.xml, then Root.xml, and possibly some others if there are links for some resources. Generally speaking, it's probably a bad idea to try merging files, but if you do: * you should use the same resolution algorithm as documented; * the total size of the CLDR files will explode and will become very large if you apply it to each supported locale. * user or application tailoring will no more work as expected; * and you won't be able to determine if a resource was actually translated or if it was inherited from another locale. Look at the CLDR documentation for the exact resolution order (which is quite tricky in some cases when there are links to other locales). You are not required to support all locales, but if you support some of them, you should include the XML files for locales that have shorter Ids and should keep the Root.xml file that provides lots of default values for many resources in most locales. > -----Message d'origine----- > De : cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org] > De la part de Naz Gassiep > Envoyé : mardi 15 janvier 2008 12:14 > À : cldr-users@unicode.org > Objet : Data set relevance > > I am a relatively new user to the CLDR datasets. > > I was wondering if someone can explain to me how to determine which > datafiles are relevant to a given locale selection. For example, if I > would like to select Chinese, in Hong Kong for the Traditional Han > script, which files would I use? Do I only use zh_Hant_HK.xml or do I > have to merge in zh.xml with that file? > > Also, if a language is written in multiple scripts, then surely each > script/language combination constitutes a whole locale, distinct from > other scripts. I will use Serbian as an example here: > > If a user wants to get the locale relating to Serbian in Montenegro > using the Latin script, how do I know if the file sr_ME has any > relevance? How do I know if the data in it pertains to the Latin or > Cyrillic scripts? Can I assume that data in the sr_ME is relevant to > *both* script variants of Serbian? > > Some pointers on this would be great. > Thanks in advance, From naz@mira.net Wed Jan 16 04:14:19 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 16 Jan 2008 04:14:19 -0600 (CST) Received: from outbound.icp-qv1-irony-out3.iinet.net.au (outbound.icp-qv1-irony-out3.iinet.net.au [203.59.1.148]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0GAEHcZ005405 for ; Wed, 16 Jan 2008 04:14:18 -0600 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAHtnjUd8qAxc/2dsb2JhbACRWpwg X-IronPort-AV: E=Sophos;i="4.24,292,1196607600"; d="scan'208";a="214682570" Received: from unknown (HELO [192.168.0.6]) ([124.168.12.92]) by outbound.icp-qv1-irony-out3.iinet.net.au with ESMTP; 16 Jan 2008 19:14:04 +0900 Message-ID: <478DD8D5.50402@mira.net> Date: Wed, 16 Jan 2008 21:13:41 +1100 From: Naz Gassiep User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: cldr-users@unicode.org Subject: Re: Data set relevance References: <478C9593.9010806@mira.net> <017801c857c3$56d6b510$0a01a8c0@HARNON> In-Reply-To: <017801c857c3$56d6b510$0a01a8c0@HARNON> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 315 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: naz@mira.net Precedence: bulk X-list: cldr-users Please excuse me if I am asking simple questions. Where are the specs for the resolution order? I am having trouble determining how the lookup algorithm should determine which files to search for a given locale, and what order to search them in. Regards, - Naz. Philippe Verdy wrote: > No merge is necessary to use the CLDR database, however the lookup will > possibly require multiple files, if a resource can't be found in the current > file. > > The resolution algorithm will possibly look first in zh_Hant_HK.xml, then > zh_Hant.xml, then zh_HK.xml, then zh.xml, then Root.xml, and possibly some > others if there are links for some resources. > > Generally speaking, it's probably a bad idea to try merging files, but if > you do: > * you should use the same resolution algorithm as documented; > * the total size of the CLDR files will explode and will become very large > if you apply it to each supported locale. > * user or application tailoring will no more work as expected; > * and you won't be able to determine if a resource was actually translated > or if it was inherited from another locale. > > Look at the CLDR documentation for the exact resolution order (which is > quite tricky in some cases when there are links to other locales). > > You are not required to support all locales, but if you support some of > them, you should include the XML files for locales that have shorter Ids and > should keep the Root.xml file that provides lots of default values for many > resources in most locales. > > >> -----Message d'origine----- >> De : cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org] >> De la part de Naz Gassiep >> Envoyé : mardi 15 janvier 2008 12:14 >> À : cldr-users@unicode.org >> Objet : Data set relevance >> >> I am a relatively new user to the CLDR datasets. >> >> I was wondering if someone can explain to me how to determine which >> datafiles are relevant to a given locale selection. For example, if I >> would like to select Chinese, in Hong Kong for the Traditional Han >> script, which files would I use? Do I only use zh_Hant_HK.xml or do I >> have to merge in zh.xml with that file? >> >> Also, if a language is written in multiple scripts, then surely each >> script/language combination constitutes a whole locale, distinct from >> other scripts. I will use Serbian as an example here: >> >> If a user wants to get the locale relating to Serbian in Montenegro >> using the Latin script, how do I know if the file sr_ME has any >> relevance? How do I know if the data in it pertains to the Latin or >> Cyrillic scripts? Can I assume that data in the sr_ME is relevant to >> *both* script variants of Serbian? >> >> Some pointers on this would be great. >> Thanks in advance, >> > > > > > > > From rick@unicode.org Thu Jan 17 19:40:20 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 17 Jan 2008 19:44:32 -0600 (CST) Received: from izanami (c-71-202-247-55.hsd1.ca.comcast.net [71.202.247.55]) by unicode.org (8.12.11/8.12.11) with SMTP id m0I1e7ug020692; Thu, 17 Jan 2008 19:40:07 -0600 Message-Id: <200801180140.m0I1e7ug020692@unicode.org> To: unicode@unicode.org Subject: Unicode 5.1.0 beta period ends soon! Date: Thu, 17 Jan 2008 17:40:06 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 316 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: cldr-users The Unicode Consortium would like to remind everyone of the deadline for review of the content and data for the pending release of Unicode 5.1. The Unicode Technical Committee meeting on February 4-8, 2007 will be making the final decisions on the content of the release, based on public review feedback and member submissions. Over the past months, there have been a number of changes to the text of 5.1.0, the text of the Unicode 5.1.0 Standard Annexes, and the UCD data files to reflect decisions of the previous UTC meetings. These cover such areas as Bidi, Line breaking, Normalization, Segmentation, Identifiers, and others. For a description of what to review and how to provide feedback, see: http://www.unicode.org/versions/beta.html There have been a number of changes on that page, in the section "Notable Issues for Beta Testers", that should help focus your review on important changes. Regards, Rick McGowan Unicode, Inc. From naz@mira.net Fri Jan 18 02:21:11 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 02:21:11 -0600 (CST) Received: from outbound.icp-qv1-irony-out3.iinet.net.au (outbound.icp-qv1-irony-out3.iinet.net.au [203.59.1.148]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0I8L5rU019900 for ; Fri, 18 Jan 2008 02:21:10 -0600 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAALnwj0d8qAxc/2dsb2JhbAAIkWCcAA X-IronPort-AV: E=Sophos;i="4.25,215,1199631600"; d="scan'208";a="215502496" Received: from unknown (HELO [192.168.0.6]) ([124.168.12.92]) by outbound.icp-qv1-irony-out3.iinet.net.au with ESMTP; 18 Jan 2008 17:21:00 +0900 Message-ID: <47906163.30308@mira.net> Date: Fri, 18 Jan 2008 19:20:51 +1100 From: Naz Gassiep User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: cldr-users@unicode.org Subject: Checking Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 317 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: naz@mira.net Precedence: bulk X-list: cldr-users Hello, Please let me know if the queries I am posting here are inappropriate. I am unable to find another list to post these questions to, but if they are not welcome here I shall refrain from sending them here. I currently have an algorithm for determining valid locales and their search paths that works like this: 1. Get the list of files, except for root.xml 2. Discard files where //ldml/alias returns a value (alias files are unnecessary) 3. Break down the filename, and return all filenames that are shorter. 4. Add root.xml to the search path This results, for example, with paths like this: Serbian (Latin script, in Serbia): sr_Latn_RS.xml sr_Latn.xml sr.xml root.xml Chinese (Simplified script, Hong Kong): zh_Hant_HK.xml zh_Hant.xml zh.xml root.xml Is this procedure sufficient for handling search paths? I have not yet dealt with multiple inheritance of resources, but I think that will best be handled after the file search path has been built. Is this correct or am I doing this wrong? Regards, - Naz. From mark.edward.davis@gmail.com Fri Jan 18 03:16:15 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 03:16:16 -0600 (CST) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.237]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0I9GFFP007219 for ; Fri, 18 Jan 2008 03:16:15 -0600 Received: by nz-out-0506.google.com with SMTP id x3so705700nzd.6 for ; Fri, 18 Jan 2008 01:16:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=UJZtIyU6aqDtFVSuq7wZwp1MO0NGAXcnzJxWCyVO6aw=; b=CjVPegpae4ufeB1p2SFyhESNKjBtECbbDRCGzOA+WBAhzkM7HNsUCQ9SuYdxmDtKpeKnPmp771zhPoisNN2ygztfGfSJx2GbMt24E4IJ+BGgpBbL5A88uBb4EWqlP0vMzb4t26oa9+V6cimVE+4lwskuYyw7lX6NV8pXV8S3ogE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=V8SRw2vrG7rt6bWiruP4Xf1Xzmp42IN0Dyh3VH/YMxhm6PK5GAzuQWfgbDxf94skAaL3l3O0mqILx1+ygzZEciKBal29nrgYnpJYo+Sj9ekjqVqGICheRbgPX30UUmSzSBB0JTaq2t9B1ZipxhVzWlQneYZfwCXFgYIqp1Gdxhk= Received: by 10.142.125.5 with SMTP id x5mr1815656wfc.191.1200647774609; Fri, 18 Jan 2008 01:16:14 -0800 (PST) Received: by 10.143.196.9 with HTTP; Fri, 18 Jan 2008 01:16:14 -0800 (PST) Message-ID: <30b660a20801180116g4a6eb49oc4ec58655e366c01@mail.gmail.com> Date: Fri, 18 Jan 2008 01:16:14 -0800 From: "Mark Davis" To: "Naz Gassiep" Subject: Re: Checking Cc: cldr-users@unicode.org In-Reply-To: <47906163.30308@mira.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_3974_13013353.1200647774627" References: <47906163.30308@mira.net> X-Google-Sender-Auth: 630f073945f6865a X-archive-position: 318 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk X-list: cldr-users ------=_Part_3974_13013353.1200647774627 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline The normal search paths for resource lookup are to truncate from the right, and that matches your examples. The complication comes in with aliases. There's a description of how they work in the spec, and there is (Java) code that does the lookup, in CLDRFile. Mark On Jan 18, 2008 12:20 AM, Naz Gassiep wrote: > Hello, > Please let me know if the queries I am posting here are > inappropriate. I am unable to find another list to post these questions > to, but if they are not welcome here I shall refrain from sending them > here. > > I currently have an algorithm for determining valid locales and their > search paths that works like this: > > 1. Get the list of files, except for root.xml > 2. Discard files where //ldml/alias returns a value (alias files are > unnecessary) > 3. Break down the filename, and return all filenames that are shorter. > 4. Add root.xml to the search path > > This results, for example, with paths like this: > > Serbian (Latin script, in Serbia): > sr_Latn_RS.xml > sr_Latn.xml > sr.xml > root.xml > > Chinese (Simplified script, Hong Kong): > zh_Hant_HK.xml > zh_Hant.xml > zh.xml > root.xml > > Is this procedure sufficient for handling search paths? I have not yet > dealt with multiple inheritance of resources, but I think that will best > be handled after the file search path has been built. Is this correct or > am I doing this wrong? > Regards, > - Naz. > > -- Mark ------=_Part_3974_13013353.1200647774627 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline The normal search paths for resource lookup are to truncate from the right, and that matches your examples.

The complication comes in with aliases. There's a description of how they work in the spec, and there is (Java) code that does the lookup, in CLDRFile.

Mark

On Jan 18, 2008 12:20 AM, Naz Gassiep <naz@mira.net> wrote:
Hello,
   Please let me know if the queries I am posting here are
inappropriate. I am unable to find another list to post these questions
to, but if they are not welcome here I shall refrain from sending them here.

I currently have an algorithm for determining valid locales and their
search paths that works like this:

  1. Get the list of files, except for root.xml
  2. Discard files where //ldml/alias returns a value (alias files are
     unnecessary)
  3. Break down the filename, and return all filenames that are shorter.
  4. Add root.xml to the search path

This results, for example, with paths like this:

Serbian (Latin script, in Serbia):
   sr_Latn_RS.xml
   sr_Latn.xml
   sr.xml
   root.xml

Chinese (Simplified script, Hong Kong):
   zh_Hant_HK.xml
   zh_Hant.xml
   zh.xml
   root.xml

Is this procedure sufficient for handling search paths? I have not yet
dealt with multiple inheritance of resources, but I think that will best
be handled after the file search path has been built. Is this correct or
am I doing this wrong?
Regards,
- Naz.




--
Mark ------=_Part_3974_13013353.1200647774627-- From naz@mira.net Fri Jan 18 10:32:31 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 10:32:31 -0600 (CST) Received: from outbound.icp-qv1-irony-out3.iinet.net.au (outbound.icp-qv1-irony-out3.iinet.net.au [203.59.1.148]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0IGWTDc000542 for ; Fri, 18 Jan 2008 10:32:30 -0600 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAE5jkEd8qAxc/2dsb2JhbAAIrWk X-IronPort-AV: E=Sophos;i="4.25,217,1199631600"; d="scan'208";a="215649540" Received: from unknown (HELO [192.168.0.6]) ([124.168.12.92]) by outbound.icp-qv1-irony-out3.iinet.net.au with ESMTP; 19 Jan 2008 01:32:24 +0900 Message-ID: <4790D491.2010309@mira.net> Date: Sat, 19 Jan 2008 03:32:17 +1100 From: Naz Gassiep User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: cldr-users@unicode.org Subject: Translation of numbers Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 319 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: naz@mira.net Precedence: bulk X-list: cldr-users Is there a CLDR/Unicode specified method for the translation of numbers? For example, to translate a number from the Latin numerals to Arabic or Thai would be trivial, as both numbering systems are syntactically identical to the Latin numeral system, just with different glyphs 0-9. However translating a Hebrew number would be more of a challenge, as the system, while decimal, uses a different numeral set, with 22 (or 27 if you consider the extended glyphs) numerals rather than 10. If there is no Unicode method for this I will have to write my own, but I thought I'd check if such a standard existed first. Does the CLDR specify at least when it is OK to so a simple glyph substitution? Any direction in this area would be greatly appreciated. Best regards, - Naz. From mark.edward.davis@gmail.com Fri Jan 18 10:47:07 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 10:47:07 -0600 (CST) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.232]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0IGl7sG005009 for ; Fri, 18 Jan 2008 10:47:07 -0600 Received: by wx-out-0506.google.com with SMTP id h27so732442wxd.3 for ; Fri, 18 Jan 2008 08:47:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=3kXwj//EwQe2VNxxL212m8tBqUvRWDsBHgix9DsCU38=; b=WZWxR6w+Vl2BGN8t4Ra3L0epcZPOCCIbYmoyIEC5Hkb5IyjyXWeTBoWvp+xWWK3Ex1EkGhW/bj/Sk7yIllucBAzNH9giQc2WiFJ7FPSK+kwXTxlhmzydERZw3ExHboMCNmNIsO7zr3ktCjXCxx2YcHjsjse+HxWR8WRQVT+22ok= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=pjpR7n+d0cEssatv6ieX9822wqVUBMsRVKZLYHnLDtoN0Vc3oa3zgEcwfXSUeHUnieJ3yiG8wnztOX7L9Jkozv2ay1sZoi0MHtPY8WOMypdQf7MeS4c+Yqmpu+pB6sz8hyvBHALWWrE8KUDIMtgsWYu87Jph4iw1iocjh0VkWXE= Received: by 10.142.163.14 with SMTP id l14mr412228wfe.230.1200674821229; Fri, 18 Jan 2008 08:47:01 -0800 (PST) Received: by 10.143.196.9 with HTTP; Fri, 18 Jan 2008 08:47:01 -0800 (PST) Message-ID: <30b660a20801180847s3089fb95r4aa7ec852e919434@mail.gmail.com> Date: Fri, 18 Jan 2008 08:47:01 -0800 From: "Mark Davis" To: "Naz Gassiep" Subject: Re: Translation of numbers Cc: cldr-users@unicode.org, UTC In-Reply-To: <4790D491.2010309@mira.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4790D491.2010309@mira.net> X-Google-Sender-Auth: 49b5d865131ef38b X-archive-position: 320 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk X-list: cldr-users Unicode does specify the numeric value of each digit, and whether or not it is a (positional) decimal digit. More complex number systems are not specified. George Iftah has a great book on numbering systems. Some libraries, like ICU, also offer some degree of non-decimal number support. Mark On Jan 18, 2008 8:32 AM, Naz Gassiep wrote: > Is there a CLDR/Unicode specified method for the translation of numbers? > For example, to translate a number from the Latin numerals to Arabic or > Thai would be trivial, as both numbering systems are syntactically > identical to the Latin numeral system, just with different glyphs 0-9. > > However translating a Hebrew number would be more of a challenge, as the > system, while decimal, uses a different numeral set, with 22 (or 27 if > you consider the extended glyphs) numerals rather than 10. > > If there is no Unicode method for this I will have to write my own, but > I thought I'd check if such a standard existed first. Does the CLDR > specify at least when it is OK to so a simple glyph substitution? Any > direction in this area would be greatly appreciated. > > Best regards, > - Naz. > > -- Mark From mark.edward.davis@gmail.com Fri Jan 18 11:41:10 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 11:41:11 -0600 (CST) Received: from el-out-1112.google.com (el-out-1112.google.com [209.85.162.182]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0IHfA4Z030542 for ; Fri, 18 Jan 2008 11:41:10 -0600 Received: by el-out-1112.google.com with SMTP id m34so40052ele.11 for ; Fri, 18 Jan 2008 09:41:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=pYIRSw6s2o3XRbXVSkVvVyw37vh+wWAZykrfB6Xdstc=; b=DzCgrkuUvQcQY24KQJWIliamvBOu40D6ECTb8lpZqH0eY9PrbbRMVWy4hpEulzzkLcw44uRBRNLwEFYXz1gvoIyMKP3zhPPLv79RoO8TX8WoNwi6v/XJWEp1hC22AE2nMN82ehM/JRuWaBHnYcItRAhY18PFW2sQ7UR+up61i1M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=CIZybByYUCT9WX8Tv6c/iwHhjLL2uLjPfCdK1C+FAloH2bOkICOHtvYzaEoCVE2py9QHDZYykxtMUcOjQXZdh4+mzSZmhOM1SBMdOfyDcbfarIRFmjPi+/mtRCsjnAhTDpfT0+fr2ayaYm6OTSjL4QS6V/mwOJjFvRXfp4zMowQ= Received: by 10.142.221.19 with SMTP id t19mr2232613wfg.100.1200678066023; Fri, 18 Jan 2008 09:41:06 -0800 (PST) Received: by 10.143.196.9 with HTTP; Fri, 18 Jan 2008 09:41:05 -0800 (PST) Message-ID: <30b660a20801180941l981d639ybb499b4cce9bc612@mail.gmail.com> Date: Fri, 18 Jan 2008 09:41:05 -0800 From: "Mark Davis" To: "Dave Opstad" Subject: Re: Translation of numbers Cc: "Naz Gassiep" , cldr-users@unicode.org, "Unicode list" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <30b660a20801180847s3089fb95r4aa7ec852e919434@mail.gmail.com> X-Google-Sender-Auth: 6535bfbfab3ff3ac X-archive-position: 321 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk X-list: cldr-users Right, thanks Dave. On Jan 18, 2008 9:08 AM, Dave Opstad wrote: > Mark Davis wrote: > > > George Iftah has a great book on numbering systems. Some libraries, > > like ICU, also offer some degree of non-decimal number support. > > A minor correction: it's Georges Ifrah, not George Iftah. > > Dave Opstad > > -- Mark From patrick.andries@xcential.com Fri Jan 18 12:28:48 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 12:28:48 -0600 (CST) Received: from skywalker.myinternetwebhost.com (skywalker.myinternetwebhost.com [69.90.236.45]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0IISl4T015247 for ; Fri, 18 Jan 2008 12:28:47 -0600 Received: from adsl-64-164-34-54.dsl.scrm01.pacbell.net [64.164.34.54] by skywalker.myinternetwebhost.com with SMTP; Fri, 18 Jan 2008 10:31:32 -0800 Message-ID: <4790EFC0.4040609@xcential.com> Date: Fri, 18 Jan 2008 10:28:16 -0800 From: Patrick Andries User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: cldr-users@unicode.org CC: =?ISO-8859-1?Q?Alain_LaBont=E9?= , =?ISO-8859-1?Q?Fran=E7ois_Yergeau?= Subject: Timezone in CLDR and corresponding ICU4J 3.6.1 & 3.8.1 References: <478E516A.3010806@hapax.qc.ca> <478EB987.3070102@hapax.qc.ca> In-Reply-To: <478EB987.3070102@hapax.qc.ca> Content-Type: multipart/alternative; boundary="------------000500090305080307010305" X-archive-position: 322 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: patrick.andries@xcential.com Precedence: bulk X-list: cldr-users This is a multi-part message in MIME format. --------------000500090305080307010305 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit I'm not too sure where this would be best mentioned : one of the ICU lists or the CLDR. Since, the data ultimately comes from CLDR, I decided after some hesitation to send it here. Please let me know if the queries I am posting here are inappropriate. Inquisitive, I tested a little bit the DateFormat found in ICU4J 3.6.1 built on CLDR 1.4.0 and 3.8.1. built on CLDR 1.5.1, if I'm right. First, a word of appreciation. I noticed some improvement from 3.6.1/CLDR 1.4.0 where the FULL time was printed for instance 23 h 42 min 42 s HNP (ÉUA) using a fr_CA Locale... The ÉUA (USA) was really not necessary for Canadians This is now in 3.8.1. : 23 h 45 min 28 s HP Where the absence of "(ÉUA)" is appreciated. Second, I have a small question though on the FULL vs LONG format (I suppose this is rather a ICU question, but I'm not too sure) See the way the same time is printed using FULL and LONG formats: LONG : 17:07:12 HNP FULL : 17 h 07 min 12 s HP What is the logic behind the fact that the timezone name of the full time format is smaller than its corresponding long format? I would have intuitively thought the opposite (see in Java http://java.sun.com/javase/6/docs/api/java/text/DateFormat.html, where LONG < FULL) Third, when the time is given as GMT ± an offset, I was a bit surprised to read "HMG" in French. At least in Canada, this is not at all common, TUC (or UTC, supposedly an "international" form) is used... http://inms-ienm.nrc-cnrc.gc.ca/time_services/leap_second_f.html (government link). «Si la vitesse et le temps sont coordonnés à l'aide de comparaisons internationales organisées sous l'égide la Convention du mètre, on obtient le **TUC** ou Temps universel coordonné qui est l'application moderne du GMT et constitue la base de temps officielle dans le monde.» http://www.nrc-cnrc.gc.ca/aboutUs/nrc90/achievements/atomicclock_f.html «Le TUC a remplacé le temps universel en 1972 pour devenir le fondement du temps officiel dans chaque pays. Les fuseaux horaires qui divisent la planète sont désormais exprimés en écart positif ou négatif par rapport au TUC. Ainsi, l'Heure normale de l'Est correspond au TUC moins cinq heures. On l'écrit donc **TUC-5**.» Maybe someone has more information on the use of HMG is French versus TUC or UCT in French... Patrick l --------------000500090305080307010305 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit

I'm not too sure where this would be best mentioned : one of the ICU lists or the CLDR. Since, the data ultimately comes from CLDR, I decided after some hesitation to send it here.  Please let me know if the queries I am posting here are inappropriate.

Inquisitive, I tested a little bit the DateFormat found in ICU4J 3.6.1 built on CLDR 1.4.0 and 3.8.1. built on CLDR 1.5.1, if I'm right.


First, a word of appreciation. I noticed some improvement from 3.6.1/CLDR 1.4.0 where the FULL time was printed for instance

   23 h 42 min 42 s HNP (ÉUA)

using a  fr_CA Locale... The ÉUA (USA) was really not necessary for Canadians

This is now in 3.8.1. :

   23 h 45 min 28 s HP

Where the absence of "(ÉUA)" is appreciated.



Second, I have a small question though on the FULL vs LONG format (I suppose this is rather a ICU question, but I'm not too sure)

See the way the same time is printed using FULL and LONG formats:

LONG :     17:07:12 HNP
FULL :      17 h 07 min 12 s HP

What is the logic behind the fact that the timezone name of the full time format is smaller than its corresponding long format? I would have intuitively thought the opposite (see in Java http://java.sun.com/javase/6/docs/api/java/text/DateFormat.html, where LONG < FULL)




Third, when the time is given as GMT ± an offset, I was a bit surprised to read "HMG" in French.

At least in Canada, this is not at all common, TUC (or UTC, supposedly an "international" form) is used...

http://inms-ienm.nrc-cnrc.gc.ca/time_services/leap_second_f.html  (government link).


«Si la vitesse et le temps sont coordonnés à l'aide de comparaisons internationales organisées sous l'égide la Convention du mètre, on obtient le **TUC** ou Temps universel coordonné qui est l'application moderne du GMT et constitue la base de temps officielle dans le monde.»

http://www.nrc-cnrc.gc.ca/aboutUs/nrc90/achievements/atomicclock_f.html

«Le TUC a remplacé le temps universel en 1972 pour devenir le fondement du temps officiel dans chaque pays. Les fuseaux horaires qui divisent la planète sont désormais exprimés en écart positif ou négatif par rapport au TUC. Ainsi, l'Heure normale de l'Est correspond au TUC moins cinq heures. On l'écrit donc **TUC-5**.»

Maybe someone has more information on the use of HMG is French versus TUC or UCT in French...

Patrick l

--------------000500090305080307010305-- From verdy_p@wanadoo.fr Fri Jan 18 15:59:45 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 18 Jan 2008 15:59:46 -0600 (CST) Received: from smtp20.orange.fr (smtp20.orange.fr [193.252.22.29]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0ILxfkQ024239 for ; Fri, 18 Jan 2008 15:59:45 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2017.orange.fr (SMTP Server) with ESMTP id 92F751C000BF for ; Fri, 18 Jan 2008 22:59:35 +0100 (CET) Received: from HARNON (APoitiers-258-1-102-51.w86-217.abo.wanadoo.fr [86.217.245.51]) by mwinf2017.orange.fr (SMTP Server) with ESMTP id 460381C000AE; Fri, 18 Jan 2008 22:59:35 +0100 (CET) X-ME-UUID: 20080118215935286.460381C000AE@mwinf2017.orange.fr Reply-To: From: "Philippe Verdy" To: "'Naz Gassiep'" , References: <4790D491.2010309@mira.net> Subject: RE: Translation of numbers Date: Fri, 18 Jan 2008 22:59:34 +0100 Organization: Ordinateur Personnel Message-ID: <026101c85a1d$69aa50a0$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <4790D491.2010309@mira.net> Thread-Index: AchZ80Rt9cCYdwTrQgGzZbSgHWsLbwAKLGew X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-archive-position: 323 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users Naz Gassiep wrote: > Is there a CLDR/Unicode specified method for the translation of numbers? > For example, to translate a number from the Latin numerals to Arabic or > Thai would be trivial, as both numbering systems are syntactically > identical to the Latin numeral system, just with different glyphs 0-9. > > However translating a Hebrew number would be more of a challenge, as the > system, while decimal, uses a different numeral set, with 22 (or 27 if > you consider the extended glyphs) numerals rather than 10. Your problem is not a problem of "translation", but conversion from a numeric system to another: such conversion is independent of the script used to write this numeric system and also independant of the language/locale, this is a numerical transform, specified by mathematical receipts. Then only, after this numeric conversion, you can map the numeric digit values to characters according to script and or language/locale preferences. Your problem has exactly the same nature as the conversion between the commonly used positional decimal system and Roman numeric system (but here also the mapping of Roman numeric digits to characters is script and locale dependant, and it is perfectly safe to convert them using normal Latin letters instead of digits within some locale context, even though some other contexts will want to use other characters to maintain a visual distinction). There are in fact lots of legacy numeric systems that used letters for denoting numbers, but they did not use the positional decimal system, and only a system where each character has a non positional unique value (not completely true for the Roman numeric system, because this value may be substractive and is contextual, however the absolute value of each character remains the same, as the Romans did not have the concept of negative numbers and just considered the absolute value as being significant). What Unicode has standardized are the digits to be used within the positional decimal system, for which a conversion is possible between multiple scripts that use this system. But there's no properties defined, for now, for other numerical systems (except possibly the hexadecimal system defined only for the European digits and the Latin letters). From rick@unicode.org Sat Jan 19 10:58:11 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Sat, 19 Jan 2008 11:04:35 -0600 (CST) Received: from izanami (c-71-202-247-55.hsd1.ca.comcast.net [71.202.247.55]) by unicode.org (8.12.11/8.12.11) with SMTP id m0JGvq7h018041; Sat, 19 Jan 2008 10:57:52 -0600 Message-Id: <200801191657.m0JGvq7h018041@unicode.org> To: unicode@unicode.org Subject: Unicode Transliteration Guidelines released Date: Sat, 19 Jan 2008 08:57:52 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 324 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: cldr-users The Unicode CLDR committee has released "Unicode Transliteration Guidelines": http://www.unicode.org/cldr/transliteration_guidelines.html Regards, Rick McGowan Unicode, Inc. From asmodai@in-nomine.org Sun Jan 20 10:24:01 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Sun, 20 Jan 2008 10:24:01 -0600 (CST) Received: from nexus.in-nomine.org (dhammapada.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0KGO0mm005913 for ; Sun, 20 Jan 2008 10:24:01 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id 05580C12E for ; Sun, 20 Jan 2008 17:23:58 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U5p+8DO9BWCL for ; Sun, 20 Jan 2008 17:23:57 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id 783D5C11A; Sun, 20 Jan 2008 17:23:57 +0100 (CET) Date: Sun, 20 Jan 2008 17:23:57 +0100 From: Jeroen Ruigrok van der Werven To: cldr-users@unicode.org Subject: Why tabs? Message-ID: <20080120162357.GC61556@nexus.in-nomine.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 325 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users Is there a reason the CLDR XML files use tabs instead of the more standard 2-space indents for XML? -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ As one lamp serves to dispel a thousand years of darkness, so one flash of wisdom destroys ten thousand years of ignorance... From asmodai@in-nomine.org Sun Jan 20 10:48:07 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Sun, 20 Jan 2008 10:48:07 -0600 (CST) Received: from nexus.in-nomine.org (dhammapada.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0KGlw5f009725 for ; Sun, 20 Jan 2008 10:48:07 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id CC47AC12E for ; Sun, 20 Jan 2008 17:47:56 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 29eNwaE4M5QR for ; Sun, 20 Jan 2008 17:47:56 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id EDB84C11A; Sun, 20 Jan 2008 17:47:55 +0100 (CET) Date: Sun, 20 Jan 2008 17:47:55 +0100 From: Jeroen Ruigrok van der Werven To: cldr-users@unicode.org Subject: Voting is weird? Message-ID: <20080120164755.GE61556@nexus.in-nomine.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 326 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users What I do not understand (as I am trying to trace a bug inside CLDR) is how voting works. I got one part of a locale here that has a score of 5 versus a score of 4 and the score of 4 won. Now, perhaps I am really misunderstanding something, but last time I checked 5 still beat 4. Is it due to being included in 1.4 that it gets preference over the weightier vote? (Which is a pity since it pulled a mistake over to 1.5.x.) -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ To conquer fear is the beginning of wisdom... From cfynn@gmx.net Sun Jan 20 23:01:17 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Sun, 20 Jan 2008 23:01:18 -0600 (CST) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by unicode.org (8.12.11/8.12.11) with SMTP id m0L51G8A028286 for ; Sun, 20 Jan 2008 23:01:17 -0600 Received: (qmail invoked by alias); 21 Jan 2008 05:01:08 -0000 Received: from h1b4.cyberstar.com (EHLO [127.0.0.1]) [202.174.9.180] by mail.gmx.net (mp019) with SMTP; 21 Jan 2008 06:01:08 +0100 X-Authenticated: #9568751 X-Provags-ID: V01U2FsdGVkX186KEc3A3k6JB+Oo28viS/PTndAmt+coRawCkq6V6 ODA7WoWrhm1nm4 Message-ID: <4794270C.2060603@gmx.net> Date: Mon, 21 Jan 2008 11:01:00 +0600 From: Christopher Fynn Reply-To: cfynn@gmx.net User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: cldr-users@unicode.org, David Germano Subject: Re: Unicode Transliteration Guidelines released References: <200801191657.m0JGvq7h018041@unicode.org> In-Reply-To: <200801191657.m0JGvq7h018041@unicode.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 080120-1, 20/01/2008), Outbound message X-Antivirus-Status: Clean X-Y-GMX-Trusted: 0 X-archive-position: 327 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: cfynn@gmx.net Precedence: bulk X-list: cldr-users Just a suggestion.. IMO for Tibetan it would probably be a good idea for CLDR simply to adopt THDL's EWTS and THDL Simplified Phonetic Transcription as standards for Tibetan transliteration and transcription in CLDR. There are of course innumerable other variants of "Wylie" transliteration and many other phonetic transcription schemes - but the two above are generally based on best practice, have wide acceptance, and are well documented. They are if you like the closest things there are to an "existing standard". One could quibble about various details in both - but once you start opening that up the arguments and nit picking could become endless. EWTS is also the basis for several Tibetan input methods. THDL seems to have a very good working relationship with both the western academic community and with China - including with Tibet University in Lhasa so adopting the schemes they endorse should not be controversial. The official Chinese (Pinyin) Romanization of Tibetan would of course have to be included as a third - but that scheme is not at all used outside of China. BTW There is also an "official" romanization (phonetic transcription) scheme for Dzongkha - if you want I can get you details of this. best regards - Chris National Library Thimphu, Bhutan Rick McGowan wrote: > The Unicode CLDR committee has released > "Unicode Transliteration Guidelines": > http://www.unicode.org/cldr/transliteration_guidelines.html > > Regards, > Rick McGowan > Unicode, Inc. > From verdy_p@wanadoo.fr Mon Jan 21 05:14:18 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 05:14:18 -0600 (CST) Received: from smtp20.orange.fr (smtp20.orange.fr [80.12.242.27]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LBEH1d022540 for ; Mon, 21 Jan 2008 05:14:18 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2011.orange.fr (SMTP Server) with ESMTP id 56F691C0023D for ; Mon, 21 Jan 2008 12:14:11 +0100 (CET) Received: from HARNON (APoitiers-258-1-102-51.w86-217.abo.wanadoo.fr [86.217.245.51]) by mwinf2011.orange.fr (SMTP Server) with ESMTP id 47C781C0020D; Mon, 21 Jan 2008 12:13:49 +0100 (CET) X-ME-UUID: 20080121111349294.47C781C0020D@mwinf2011.orange.fr Reply-To: From: "Philippe Verdy" To: "'Jeroen Ruigrok van der Werven'" , References: <20080120162357.GC61556@nexus.in-nomine.org> Subject: RE: Why tabs? Date: Mon, 21 Jan 2008 12:13:46 +0100 Organization: Ordinateur Personnel Message-ID: <02e901c85c1e$b18ea8f0$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20080120162357.GC61556@nexus.in-nomine.org> Thread-Index: AchbgtZX3YjvOJvmQiGS6/0ORqLqRAAmgCLA X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id m0LBEH1d022540 X-archive-position: 328 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users Jeroen Ruigrok van der Werven wrote: > Envoyé : dimanche 20 janvier 2008 17:24 > À : cldr-users@unicode.org > Objet : Why tabs? > > Is there a reason the CLDR XML files use tabs instead of the more standard > 2-space indents for XML?Spaces are not more standard than tabs. They behave equally, with regard to the xml:space pseudo-attribute semantics and behaviour. 2-spaces are probably easier to read in an editor (due to the multiple embedding levels where tabs are often rendered to produce too large left margins); however tabs can be set to whatever width you want in your editor (provided that you use a decent editor for easy XML editing, that provides at least the support for easy indentation of blocks of lines; XML being a programming language and not a humane language, using a program editor instead of a basic text editor will provide this support, as well as syntax highlighting to help prevent syntax errors when editing, due to a missing closing quote or missing element closing tag or missing / at end of a self-closing element tag, or other unmatched punctuation pairs in CDATA sections). One tab character is still more compact than two spaces, and saves storage space and decoding time, for CLDR files that are meant to be used primarily by automated tools; note also that other users prefer 3-space or 4-space tabs. Tabs are preferable in this case, because editors will immediately be able to render the file using various tab-width settings without modification in the source file. Anyway, XML files could also remove them (including all linefeeds and carrage returns) completely for faster processing and better compaction. Using tabs is not a problem for either reading or processing the file. Philippe. > > -- > Jeroen Ruigrok van der Werven / asmodai > イェルーン ラウフロック ヴァン デル ウェルヴェン > http://www.in-nomine.org/ | http://www.rangaku.org/ > As one lamp serves to dispel a thousand years of darkness, so one flash of > wisdom destroys ten thousand years of ignorance... > > From asmodai@in-nomine.org Mon Jan 21 07:43:16 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 07:43:16 -0600 (CST) Received: from nexus.in-nomine.org (dhammapada.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LDhFvl008580 for ; Mon, 21 Jan 2008 07:43:16 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id AA8A2C12E; Mon, 21 Jan 2008 14:43:14 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f397hlgEzODM; Mon, 21 Jan 2008 14:43:13 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id B938EC11A; Mon, 21 Jan 2008 14:43:13 +0100 (CET) Date: Mon, 21 Jan 2008 14:43:13 +0100 From: Jeroen Ruigrok van der Werven To: Philippe Verdy Cc: cldr-users@unicode.org Subject: Re: Why tabs? Message-ID: <20080121134313.GP61556@nexus.in-nomine.org> References: <20080120162357.GC61556@nexus.in-nomine.org> <02e901c85c1e$b18ea8f0$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <02e901c85c1e$b18ea8f0$0a01a8c0@HARNON> Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 329 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users -On [20080121 12:14], Philippe Verdy (verdy_p@wanadoo.fr) wrote: >2-spaces are probably easier to read in an editor (due to the multiple >embedding levels where tabs are often rendered to produce too large left >margins); [snip] >One tab character is still more compact than two spaces, and saves storage >space and decoding time, for CLDR files that are meant to be used primarily >by automated tools; note also that other users prefer 3-space or 4-space >tabs. Sorry Philippe, but 1 character versus 2 character processing time and storage advantages are marginal advantages at best. In a parser tab and spaces would both be handled by a white space match. So the only marginal advantage would be storage and then we're talking about a few KiB maximum, so I consider that a moot point. 3-space and 4-space _tabs_ are something entirely different. I am merely talking about 2 space _indents_, so not depending on any tab character at all. >Tabs are preferable in this case, because editors will immediately be able to >render the file using various tab-width settings without modification in the >source file. I learnt a lesson a long time ago: never mess with tab settings. That way lies madness, especially due to everybody liking a different indentation level and changing the tab width to her/his desired width. Oh well, I should have known better than to raise this 'issue'. Some day I'll learn. -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ Born from the Dark, in the black Cloak of Night... From SRS0=ZsbE7c=SL=oracle.com=christine.hill@srs.bis.na.blackberry.com Mon Jan 21 10:02:14 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 10:02:14 -0600 (CST) Received: from smtp03.bis.na.blackberry.com (smtp03.bis.na.blackberry.com [216.9.248.50]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LG2ARc019567 for ; Mon, 21 Jan 2008 10:02:14 -0600 Received: from bda001.bis.na.blackberry.com (bda229.bisx.prod.on.blackberry [172.20.228.129]) by srs.bis.na.blackberry.com (8.13.7 TEAMON/8.13.7) with ESMTP id m0LG22TH018463; Mon, 21 Jan 2008 16:02:02 GMT Received: from bda229-cell02.bisx.prod.on.blackberry (localhost.localdomain [127.0.0.1]) by bda001.bis.na.blackberry.com (8.13.4 TEAMON/8.13.4) with ESMTP id m0LG20la031155; Mon, 21 Jan 2008 16:02:00 GMT X-rim-org-msg-ref-id: 431817939 Message-ID: <431817939-1200931320-cardhu_decombobulator_blackberry.rim.net-1883279368-@bxe122.bisx.prod.on.blackberry> Reply-To: Christine.hill@oracle.com X-Priority: Normal References: <200801191657.m0JGvq7h018041@unicode.org><4794270C.2060603@gmx.net> In-Reply-To: <4794270C.2060603@gmx.net> Sensitivity: Normal Importance: Normal To: cfynn@gmx.net, cldr-users@unicode.org, "David Germano" Subject: Re: Unicode Transliteration Guidelines released From: "=?utf-8?B?Q2hyaXN0aW5l?=" Date: Mon, 21 Jan 2008 16:01:53 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by unicode.org id m0LG2ARc019567 X-archive-position: 330 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: Christine.hill@oracle.com Precedence: bulk X-list: cldr-users Thanks ! Sent via BlackBerry by AT&T -----Original Message----- From: Christopher Fynn Date: Mon, 21 Jan 2008 11:01:00 To:cldr-users@unicode.org, David Germano Subject: Re: Unicode Transliteration Guidelines released Just a suggestion.. IMO for Tibetan it would probably be a good idea for CLDR simply to adopt THDL's EWTS and THDL Simplified Phonetic Transcription as standards for Tibetan transliteration and transcription in CLDR. There are of course innumerable other variants of "Wylie" transliteration and many other phonetic transcription schemes - but the two above are generally based on best practice, have wide acceptance, and are well documented. They are if you like the closest things there are to an "existing standard". One could quibble about various details in both - but once you start opening that up the arguments and nit picking could become endless. EWTS is also the basis for several Tibetan input methods. THDL seems to have a very good working relationship with both the western academic community and with China - including with Tibet University in Lhasa so adopting the schemes they endorse should not be controversial. The official Chinese (Pinyin) Romanization of Tibetan would of course have to be included as a third - but that scheme is not at all used outside of China. BTW There is also an "official" romanization (phonetic transcription) scheme for Dzongkha - if you want I can get you details of this. best regards - Chris National Library Thimphu, Bhutan Rick McGowan wrote: > The Unicode CLDR committee has released > "Unicode Transliteration Guidelines": > http://www.unicode.org/cldr/transliteration_guidelines.html > > Regards, > Rick McGowan > Unicode, Inc. > From srl@icu-project.org Mon Jan 21 11:39:35 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 11:39:35 -0600 (CST) Received: from mail.monkey.sbay.org (monkey.sbay.org [216.27.178.44]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LHdZw2032532 for ; Mon, 21 Jan 2008 11:39:35 -0600 Received: from tintin.priv ([10.0.0.119]) by mail.monkey.sbay.org with esmtp (Exim 4.50) id 1JH0cO-0003to-84; Mon, 21 Jan 2008 09:39:32 -0800 Message-ID: <4794D8D1.8080501@icu-project.org> Date: Mon, 21 Jan 2008 09:39:29 -0800 From: "Steven R. Loomis" User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Jeroen Ruigrok van der Werven CC: cldr-users@unicode.org Subject: Re: Voting is weird? References: <20080120164755.GE61556@nexus.in-nomine.org> In-Reply-To: <20080120164755.GE61556@nexus.in-nomine.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 331 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: srl@icu-project.org Precedence: bulk X-list: cldr-users Voting is not currently active. Perhaps you are trying to compare the voting situation to the XML data? Otherwise, please include the URLs of what you are referring to. -s Jeroen Ruigrok van der Werven wrote: > What I do not understand (as I am trying to trace a bug inside CLDR) is how > voting works. > > I got one part of a locale here that has a score of 5 versus a score of 4 and > the score of 4 won. Now, perhaps I am really misunderstanding something, but > last time I checked 5 still beat 4. Is it due to being included in 1.4 that it > gets preference over the weightier vote? (Which is a pity since it pulled a > mistake over to 1.5.x.) > > From asmodai@in-nomine.org Mon Jan 21 11:53:28 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 11:53:28 -0600 (CST) Received: from nexus.in-nomine.org (dhammapada.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LHrOMY025262 for ; Mon, 21 Jan 2008 11:53:28 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id 8F30DC12E; Mon, 21 Jan 2008 18:53:22 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vene7j8He2Jv; Mon, 21 Jan 2008 18:53:21 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id E6354C11A; Mon, 21 Jan 2008 18:53:21 +0100 (CET) Date: Mon, 21 Jan 2008 18:53:21 +0100 From: Jeroen Ruigrok van der Werven To: "Steven R. Loomis" Cc: cldr-users@unicode.org Subject: Re: Voting is weird? Message-ID: <20080121175321.GQ61556@nexus.in-nomine.org> References: <20080120164755.GE61556@nexus.in-nomine.org> <4794D8D1.8080501@icu-project.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4794D8D1.8080501@icu-project.org> Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 332 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users -On [20080121 18:39], Steven R. Loomis (srl@icu-project.org) wrote: >Voting is not currently active. Perhaps you are trying to compare the >voting situation to the XML data? I was looking at historical data for the voting. >Otherwise, please include the URLs of what you are referring to. http://unicode.org/cldr/apps/survey?_=nb&forum=nb&xpath=84189 (Ties in with the bug I reported in #1592.) -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ Everything comes to those who wait... From srl@icu-project.org Mon Jan 21 12:00:15 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 12:00:15 -0600 (CST) Received: from k2smtpout02-01.prod.mesa1.secureserver.net (k2smtpout02-01.prod.mesa1.secureserver.net [64.202.189.90]) by unicode.org (8.12.11/8.12.11) with SMTP id m0LI0BE1029496 for ; Mon, 21 Jan 2008 12:00:15 -0600 Received: (qmail 16120 invoked from network); 21 Jan 2008 18:00:11 -0000 Received: from unknown (HELO ssl.icu-project.org) (208.109.248.225) by k2smtpout02-01.prod.mesa1.secureserver.net (64.202.189.90) with ESMTP; 21 Jan 2008 18:00:11 -0000 Received: from monkey.sbay.org ([216.27.178.44] helo=tintin.priv) by ssl.icu-project.org with esmtpsa (SSLv3:AES256-SHA:256) (Exim 4.62) (envelope-from ) id 1JH0wN-0002Ac-0p; Mon, 21 Jan 2008 10:00:11 -0800 Message-ID: <4794DDAA.4040807@icu-project.org> Date: Mon, 21 Jan 2008 10:00:10 -0800 From: "Steven R. Loomis" User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Jeroen Ruigrok van der Werven CC: cldr-users@unicode.org Subject: Re: Voting is weird? References: <20080120164755.GE61556@nexus.in-nomine.org> <4794D8D1.8080501@icu-project.org> <20080121175321.GQ61556@nexus.in-nomine.org> In-Reply-To: <20080121175321.GQ61556@nexus.in-nomine.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 333 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: srl@icu-project.org Precedence: bulk X-list: cldr-users A score of 5 is not enough to overturn the previous release's value (marked with a blue star) of 4 The "MMM." was the previous value, and had one vote with a value of 4 The "MMM" had two votes, values of 4+1 = 5 What was supposed to happen is that the vetters would discuss this conflict and resolve it. The "sep._._" is clearly visible above. As you noted there is an interaction between the format and the value of the individual month. The voting calculation itself is working as it was supposed to. -s Jeroen Ruigrok van der Werven wrote: > -On [20080121 18:39], Steven R. Loomis (srl@icu-project.org) wrote: > >> Voting is not currently active. Perhaps you are trying to compare the >> voting situation to the XML data? >> > > I was looking at historical data for the voting. > > >> Otherwise, please include the URLs of what you are referring to. >> > > http://unicode.org/cldr/apps/survey?_=nb&forum=nb&xpath=84189 > > (Ties in with the bug I reported in #1592.) > > From verdy_p@wanadoo.fr Mon Jan 21 12:20:55 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 12:20:55 -0600 (CST) Received: from smtp20.orange.fr (smtp20.orange.fr [80.12.242.26]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LIKmwt003723 for ; Mon, 21 Jan 2008 12:20:55 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2026.orange.fr (SMTP Server) with ESMTP id 0A4201C002AE for ; Mon, 21 Jan 2008 19:20:43 +0100 (CET) Received: from HARNON (APoitiers-258-1-102-51.w86-217.abo.wanadoo.fr [86.217.245.51]) by mwinf2026.orange.fr (SMTP Server) with ESMTP id A51701C002A8; Mon, 21 Jan 2008 19:20:42 +0100 (CET) X-ME-UUID: 20080121182042676.A51701C002A8@mwinf2026.orange.fr Reply-To: From: "Philippe Verdy" To: "'Steven R. Loomis'" , "'Jeroen Ruigrok van der Werven'" Cc: References: <20080120164755.GE61556@nexus.in-nomine.org> <4794D8D1.8080501@icu-project.org> Subject: RE: Voting is weird? Date: Mon, 21 Jan 2008 19:20:40 +0100 Organization: Ordinateur Personnel Message-ID: <02fd01c85c5a$542b6040$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <4794D8D1.8080501@icu-project.org> Thread-Index: AchcVeXzK2O6+othSnGj5PcWZLaK7AAAx2DQ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-archive-position: 334 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: cldr-users Steven R. Loomis wrote: > Voting is not currently active. Jeroen visibly knows that. > Perhaps you are trying to compare the > voting situation to the XML data? He is refering to the pages giving the results of the previous vetting process before CLDR 1.5. There's been no vetting between 1.5 and 1.5.1, even though there were a few elements added (and some corrected for which there was no prior agreement or insufficient quota during the last vettings). Now he can see that the vetting scores are quite troubling given the resulting data that is now included. He comes a bit late, this should have been documented as a bug after the completion of CLDR 1.5 data vetting and beta submission. But anyway, he can file a bug with the URL to the 1.5 vetting results page to ask why some items with higher scores where not retained. May be some votes came later after the public vote completion, or some data was extracted from the vetting database, where it forgot some votes that came in the latest period at end of vetting. He should indicate in which locale this happens, and for which resources. May be this could be corrected, if there was already enough votes for the required changes he suggests. It's important, if you're a vetter, to follow the process up to its completion, including after the vetting period is closed, waiting for the beta. You can also look at the proposed changes in the next CLDR version: look at new resources, how the past data interacts with the new formatting options. This can help identify bugs in the CLDR code, before the next vetting period reopens. From srl@icu-project.org Mon Jan 21 12:36:55 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 12:36:56 -0600 (CST) Received: from k2smtpout02-02.prod.mesa1.secureserver.net (k2smtpout02-02.prod.mesa1.secureserver.net [64.202.189.91]) by unicode.org (8.12.11/8.12.11) with SMTP id m0LIatdk013415 for ; Mon, 21 Jan 2008 12:36:55 -0600 Received: (qmail 9479 invoked from network); 21 Jan 2008 18:36:54 -0000 Received: from unknown (HELO ssl.icu-project.org) (208.109.248.225) by k2smtpout02-02.prod.mesa1.secureserver.net (64.202.189.91) with ESMTP; 21 Jan 2008 18:36:54 -0000 Received: from monkey.sbay.org ([216.27.178.44] helo=tintin.priv) by ssl.icu-project.org with esmtpsa (SSLv3:AES256-SHA:256) (Exim 4.62) (envelope-from ) id 1JH1Vu-0002LR-8h; Mon, 21 Jan 2008 10:36:54 -0800 Message-ID: <4794E645.4000909@icu-project.org> Date: Mon, 21 Jan 2008 10:36:53 -0800 From: "Steven R. Loomis" User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: verdy_p@wanadoo.fr CC: "'Jeroen Ruigrok van der Werven'" , cldr-users@unicode.org Subject: Re: Voting is weird? References: <20080120164755.GE61556@nexus.in-nomine.org> <4794D8D1.8080501@icu-project.org> <02fd01c85c5a$542b6040$0a01a8c0@HARNON> In-Reply-To: <02fd01c85c5a$542b6040$0a01a8c0@HARNON> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 335 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: srl@icu-project.org Precedence: bulk X-list: cldr-users According to http://www.unicode.org/cldr/data/docs/web/process.html#resolution_procedure under optimal field value, the "MMM" (O) needed to have twice score of the next highest vote getter (N), in other words, a score of 8 instead of 5, to overturn the existing value. One additional regular vetter (4) would have sufficed. * It's not a bug that the "higher score was not retained". It may be a "bug" that the process was confusing, not explained well enough, etc. Or, that there are too many pages for vetters to wade through to see what is really happening. Or, that somehow the vetters did not cooperate to resolve this. One of the vetters involved in this case did post to the forum, which would have notified the other two. -s Philippe Verdy wrote: > Steven R. Loomis wrote: > >> Voting is not currently active. >> > > Jeroen visibly knows that. > >> Perhaps you are trying to compare the >> voting situation to the XML data? >> > > He is refering to the pages giving the results of the previous vetting > process before CLDR 1.5. There's been no vetting between 1.5 and 1.5.1, even > though there were a few elements added (and some corrected for which there > was no prior agreement or insufficient quota during the last vettings). > > Now he can see that the vetting scores are quite troubling given the > resulting data that is now included. He comes a bit late, this should have > been documented as a bug after the completion of CLDR 1.5 data vetting and > beta submission. > > But anyway, he can file a bug with the URL to the 1.5 vetting results page > to ask why some items with higher scores where not retained. May be some > votes came later after the public vote completion, or some data was > extracted from the vetting database, where it forgot some votes that came in > the latest period at end of vetting. > > He should indicate in which locale this happens, and for which resources. > May be this could be corrected, if there was already enough votes for the > required changes he suggests. > > It's important, if you're a vetter, to follow the process up to its > completion, including after the vetting period is closed, waiting for the > beta. You can also look at the proposed changes in the next CLDR version: > look at new resources, how the past data interacts with the new formatting > options. This can help identify bugs in the CLDR code, before the next > vetting period reopens. > > > > > From asmodai@in-nomine.org Mon Jan 21 13:01:05 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 13:01:05 -0600 (CST) Received: from nexus.in-nomine.org (dhammapada.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LJ143W020473 for ; Mon, 21 Jan 2008 13:01:05 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id 5D7B5C12E; Mon, 21 Jan 2008 20:01:02 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u6f130Ecyabw; Mon, 21 Jan 2008 20:01:01 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id 35409C11A; Mon, 21 Jan 2008 20:01:01 +0100 (CET) Date: Mon, 21 Jan 2008 20:01:01 +0100 From: Jeroen Ruigrok van der Werven To: Philippe Verdy Cc: "'Steven R. Loomis'" , cldr-users@unicode.org Subject: Re: Voting is weird? Message-ID: <20080121190101.GR61556@nexus.in-nomine.org> References: <20080120164755.GE61556@nexus.in-nomine.org> <4794D8D1.8080501@icu-project.org> <02fd01c85c5a$542b6040$0a01a8c0@HARNON> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <02fd01c85c5a$542b6040$0a01a8c0@HARNON> Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 336 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users -On [20080121 19:20], Philippe Verdy (verdy_p@wanadoo.fr) wrote: >He is refering to the pages giving the results of the previous vetting >process before CLDR 1.5. There's been no vetting between 1.5 and 1.5.1, even >though there were a few elements added (and some corrected for which there >was no prior agreement or insufficient quota during the last vettings). I came late on the scene, I think 1.5 just got released since that's what we started using in Babel. >Now he can see that the vetting scores are quite troubling given the >resulting data that is now included. He comes a bit late, this should have >been documented as a bug after the completion of CLDR 1.5 data vetting and >beta submission. The voting results, are a bit hard to follow. You have three voting parties voting for a proposal. Two of the three vote for one proposal, the other party votes for the form that was present in 1.4. Now, given ordinary numeric value you would flag such an entry for closer inspection instead of defaulting it a pass since it was present in 1.4. Call it Ockham's Razor for all I care, I believe in the simplest approach and the way you have to interpret the results are not straightforward. I readily admit I have no in-depth knowledge of the voting procedure, but it defies the principle of least astonishment thus far. Glancing over http://unicode.org/cldr/process.html is not inspiring either to be honest, but that may well be just my opinion. >But anyway, he can file a bug with the URL to the 1.5 vetting results page >to ask why some items with higher scores where not retained. Yes, I did, sort of. See #1592 >It's important, if you're a vetter, to follow the process up to its >completion, including after the vetting period is closed, waiting for the >beta. You can also look at the proposed changes in the next CLDR version: >look at new resources, how the past data interacts with the new formatting >options. This can help identify bugs in the CLDR code, before the next >vetting period reopens. Such is very obvious right now to me. I am not sure if I like the automated system on a number of fronts, that will require more thought and reflection. But right now I am leaning towards a "it's not clear how or why a certain decision was reached"-stance. (With all due respect to the good intentions of the people involved.) -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ The focused mind can pierce through stone... From asmodai@in-nomine.org Mon Jan 21 13:08:17 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 13:08:18 -0600 (CST) Received: from nexus.in-nomine.org (dhammapada.xs4all.nl [82.95.168.248]) by unicode.org (8.12.11/8.12.11) with ESMTP id m0LJ8Hkw021513 for ; Mon, 21 Jan 2008 13:08:17 -0600 Received: from localhost (localhost.domini.in-nomine.org [127.0.0.1]) by nexus.in-nomine.org (Postfix) with ESMTP id 54A30C12E; Mon, 21 Jan 2008 20:08:15 +0100 (CET) X-Virus-Scanned: by amavisd-new using ClamAV at in-nomine.org Received: from nexus.in-nomine.org ([127.0.0.1]) by localhost (nexus.domini.in-nomine.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dTH4hacukTWk; Mon, 21 Jan 2008 20:08:14 +0100 (CET) Received: by nexus.in-nomine.org (Postfix, from userid 1000) id 6AF91C11A; Mon, 21 Jan 2008 20:08:14 +0100 (CET) Date: Mon, 21 Jan 2008 20:08:14 +0100 From: Jeroen Ruigrok van der Werven To: "Steven R. Loomis" Cc: verdy_p@wanadoo.fr, cldr-users@unicode.org Subject: Re: Voting is weird? Message-ID: <20080121190814.GS61556@nexus.in-nomine.org> References: <20080120164755.GE61556@nexus.in-nomine.org> <4794D8D1.8080501@icu-project.org> <02fd01c85c5a$542b6040$0a01a8c0@HARNON> <4794E645.4000909@icu-project.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4794E645.4000909@icu-project.org> Organisation: Ninth Circle Enterprises User-Agent: Mutt/1.5.17 (2007-11-01) X-archive-position: 337 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmodai@in-nomine.org Precedence: bulk X-list: cldr-users -On [20080121 19:36], Steven R. Loomis (srl@icu-project.org) wrote: >According to >http://www.unicode.org/cldr/data/docs/web/process.html#resolution_procedure > under optimal field value, the "MMM" (O) needed to have twice score of >the next highest vote getter (N), in other words, a score of 8 instead of >5, to overturn the existing value. One additional regular vetter (4) >would have sufficed. > >* It's not a bug that the "higher score was not retained". Based on the rules for the voting/vetting procedure, no. I do consider it a huge flaw in the system if established items that get another proposal raised against it can be automatically overturned due to a simple mathematical formula. Call or consider me pedantic, zealous or a perfectionist, but if such stuff gets recorded I'd personally review such data before releasing. Once again, not meant as a jibe or anything on the effort put in --I value such a repository and the work put in immensely in my i18n/l10n work-- but I do consider at least part of the procedure and methodology flawed. So the question becomes: how to resolve this particular flaw in subsequent releases? -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ The riddle master himself lost the key to his own riddles one day, and found it again at the bottom of his heart. From eflarup@yahoo.com Mon Jan 21 13:51:40 2008 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 21 Jan 2008 13:51:40 -0600 (CST) Received: from web50705.mail.re2.yahoo.com (web50705.mail.re2.yahoo.com [206.190.38.103]) by unicode.org (8.12.11/8.12.11) with SMTP id m0LJpdSt029029 for ; Mon, 21 Jan 2008 13:51:39 -0600 Received: (qmail 26333 invoked by uid 60001); 21 Jan 2008 19:51:34