From v-magdad@microsoft.com Mon Mar 12 09:47:57 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 12 Mar 2007 09:48:40 -0600 (CST) Received: from smtp.microsoft.com (mailc.microsoft.com [131.107.115.214]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2CFlvYr007614; Mon, 12 Mar 2007 09:47:57 -0600 Received: from tk5-exhub-c104.redmond.corp.microsoft.com (157.54.70.185) by TK5-EXGWY-E803.partners.extranet.microsoft.com (10.251.56.169) with Microsoft SMTP Server (TLS) id 8.0.685.24; Mon, 12 Mar 2007 08:47:51 -0700 Received: from NA-EXMSG-C134.redmond.corp.microsoft.com ([157.54.62.176]) by tk5-exhub-c104.redmond.corp.microsoft.com ([157.54.70.185]) with mapi; Mon, 12 Mar 2007 08:47:51 -0700 From: "Magda Danish (Unicode)" To: " (unicode@unicode.org)" Date: Mon, 12 Mar 2007 08:47:50 -0700 Subject: Unicode Announces Start of Submission Period for Common Locale Data Repository, Version 1.5 Thread-Topic: Unicode Announces Start of Submission Period for Common Locale Data Repository, Version 1.5 Thread-Index: AQHHZL3JyK5nN+XzykeyVl/Wu0/wzQ== Message-ID: <8DBD16074283AE41A3A54A5926704DC90BC80558DC@NA-EXMSG-C134.redmond.corp.microsoft.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l2CFlvYr007614 X-archive-position: 32 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: v-magdad@microsoft.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Unicode Announces Start of Submission Period for Common Locale Data Repository, Version 1.5 Mountain View, CA, March 12, 2007 -- The Unicode® Consortium today announced the start of data submission for the next release of the Unicode Common Locale Data Repository (CLDR), Version 1.5. The Unicode CLDR is the largest and most extensive standard repository of locale data in the industry today; it is completely based on Unicode and conformant to the latest version of the Unicode Standard, Unicode 5.0. The Unicode CLDR is widely used by companies such as Adobe, Apple, Google, IBM, and Sun, and organizations such as openoffice.org; it provides key building blocks for software to seamlessly support the world's languages. During the data submission period, from March 12th to April 29th, 2007, contributors from Unicode Consortium members, other organizations and the public at large are invited to review the data for their languages and locations, and propose new translations of terms or modifications, including language translations entirely new to the repository. New structure has been added to the repository to improve representation of time zones, ranges of dates and the usage of languages. The data can be viewed using the online tool found at http://unicode.org/cldr/survey_tool.html. This tool has been substantially enhanced for this release and a forum has been added to allow communication between translators. For more information about the Unicode CLDR project, see http://www.unicode.org/cldr/. For more information about the Unicode Standard and the latest Version 5.0, see http://www.unicode.org/versions/Unicode5.0.0/. About the Unicode Consortium The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry: Adobe Systems, Apple, Basis Technology, Denic e.G., Google, Government of India - Ministry of Information Technology, Government of Pakistan - National Language Authority, HP, IBM, Justsystem, Microsoft, Monotype Imaging, Oracle, SAP, Sun Microsystems, Sybase, The University of California at Berkeley, Yahoo, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/. From rick@unicode.org Mon Mar 12 13:09:06 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 12 Mar 2007 13:13:46 -0600 (CST) Received: from izanami (c-67-188-204-169.hsd1.ca.comcast.net [67.188.204.169]) by unicode.org (8.13.4/8.12.11) with SMTP id l2CJ8qno017224; Mon, 12 Mar 2007 13:08:53 -0600 Message-Id: <200703121908.l2CJ8qno017224@unicode.org> To: unicode@unicode.org Subject: New Public Review Issue: Proposed Update UAX #14 Date: Mon, 12 Mar 2007 11:08:47 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 33 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on May 8, 2007. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: PRI #105 Proposed Update to UAX #14: Line Breaking Properties This proposed update for UAX#14 updates the description of linebreak classes with the line break properties in the beta version of the Unicode Character Database, version 5.0.1. The rules were updated to support the sequence for languages such as Polish and Portuguese. The conformance clause was updated to propose additional language on permissible higher level protocols. The entire text has been reviewed, and improved in a number of places, to make it easier to normatively reference this UAX from other specifications. Owners of other specifications (higher level protocols) are particularly encouraged to review this proposed update. Note: the line breaking rules for Ethiopic are under separate investigation. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc. From mark.edward.davis@gmail.com Wed Mar 14 18:23:42 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 14 Mar 2007 18:23:46 -0600 (CST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.227]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2F0NfVV009143 for ; Wed, 14 Mar 2007 18:23:41 -0600 Received: by wr-out-0506.google.com with SMTP id 71so393869wri for ; Wed, 14 Mar 2007 17:23:41 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=jZycxbIErFlPfoM9VrVRH73sSHLgaMEIG4k9MIMnUVVvKyt4NFC4x2JbPOKEzIyM9t8GrGHUOBRTiYN2x/sB120Uzvg5/BuFiCGsgNGCxf0sODWHl0nzV7Uo9qLCEWa90hIOHa1zEzgqeA8EO9FfY8v8DSenzGqkOUjxv+f+x+k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=IF+5ajB2T4klI9GQmQCI/UAWmDiMf4pj6mct9MJ8sJCAPv3gCjByONaZ8AYeTrW+4GDxzlYfqJ6bRGVQn68QerQIme4Kw85ZJjPbltX2jo2Jsne9vdnBzPAJ0w9HVUGSAft4PUl0B3l3roXsQE8XQ/s9sFJfnmuRhu4MnoiR60Y= Received: by 10.115.108.1 with SMTP id k1mr3242580wam.1173918219476; Wed, 14 Mar 2007 17:23:39 -0700 (PDT) Received: by 10.114.196.2 with HTTP; Wed, 14 Mar 2007 17:23:39 -0700 (PDT) Message-ID: <30b660a20703141723l6f060b9fi8f887c67dd76a513@mail.gmail.com> Date: Wed, 14 Mar 2007 17:23:39 -0700 From: "Mark Davis" To: "sukhjinder_sidhu@hotmail.com" Subject: Re: [Ltru] Punjabi Cc: "John Cowan" , "Sarmad Hussain, Dr." , "LTRU Working Group" , "Nayyara Karamat" , iso639-2@loc.gov, rick@unicode.org, iso639-3@sil.org, "Abbas Malik" , cldr-users@unicode.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_69371_24562262.1173918219410" References: <20070314222951.GQ1509@mercury.ccil.org> X-Google-Sender-Auth: b7c283cceb2f8bb5 X-archive-position: 34 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users ------=_Part_69371_24562262.1173918219410 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline A few points of clarification, since this arose with regard to CLDR. The CLDR data is divided into linguistic data and non-linguistic data, and what are called "locales" are only used for linguistic data; so in that sense CLDR locales are really collections of language data, not locale data in a POSIX sense. We use inheritance in the data model, so we only need to add script subtags when necessary (when the language is customarily written in 2 scripts), and only need to add region subtags when the language+script is in common use in more than one country AND some of the linguistic data differs between them. (Although sometimes for compatibility we add empty country locales.) So what we would need at a minimum right now is: 1. pa (containing data appropriate for pa-Guru-IN) If in the future we get data submitted for pa written in Arabic, we would add 2. pa-Arab (containing data appropriate for pa-Arab-PK) 3. pa-Guru (containing data appropriate for pa-Guru-IN) Our policy is to have each parent locale's data be what is appropriate for the most populous (measured as literate, first or second language speakers) child locale. So what we'd need to have at that point is the population of pa users (as opposed to lah) in Pakistan vs India, which would determine whether we change the contents of pa at that point. It sounds like, from what you say, that the bulk of the population of pa users (measured as literate, first or second language speakers) would be in India, while Pakistan would be more lah users than pa users. Is that the case? Mark On 3/14/07, sukhjinder_sidhu@hotmail.com wrote: > > > The current plan is to map "Eastern Panjabi" onto "Panjabi", and all > > the others onto "Lahnda". It sounds like you are proposing to map > > both "Eastern Panjabi" and "Western Panjabi" onto "Panjabi" > > and the others onto "Lahnda". Is that correct? > > No, I'm saying that a significant number (i.e. millions) of people speak > "Eastern" Punjabi in Pakistan. The only major difference between this > spoken language and that spoken in Indian Punjab is the written script ( > i.e. > Shahmukhi/Gurmukhi). The sources that say that Western Punjabi and > Eastern > Punjabi suddenly stop at the border are simply wrong. The Maajhi dialect > is > centred on the border - leading to significant portions of the population > speaking it on either side. Hence the need for both 'pa-IN' and 'pa-PK'. > > "Grierson [in the Linguistic Survey of India] defined Western Punjabi as > being west of a line running north-south from Montgomery and Gujranwala > districts." (This is directly from Wikipedia, but I was the original > contributor for this in the article. If you wish, I can probably track > down > the exact page number for this, but I don't have it at hand at the > moment.) > This is well within present day Pakistan, and Lahore alone has nearly > seven > million people. > > Anyway, that is my take on things. I would be interested to see what NU > professors/researchers think on this because I've never been to Pakistan > so > I could be way off base here. > > Regards, > Sukhjinder Sidhu > > -- Mark ------=_Part_69371_24562262.1173918219410 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline A few points of clarification, since this arose with regard to CLDR. The CLDR data is divided into linguistic data and non-linguistic data, and what are called "locales" are only used for linguistic data; so in that sense CLDR locales are really collections of language data, not locale data in a POSIX sense. We use inheritance in the data model, so we only need to add script subtags when necessary (when the language is customarily written in 2 scripts), and only need to add region subtags when the language+script is in common use in more than one country AND some of the linguistic data differs between them. (Although sometimes for compatibility we add empty country locales.) So what we would need at a minimum right now is:

1. pa (containing data appropriate for pa-Guru-IN)

If in the future we get data submitted for pa written in Arabic, we would add

2. pa-Arab (containing data appropriate for pa-Arab-PK)
3. pa-Guru (containing data appropriate for pa-Guru-IN)

Our policy is to have each parent locale's data be what is appropriate for the most populous (measured as literate, first or second language speakers) child locale. So what we'd need to have at that point is the population of pa users (as opposed to lah) in Pakistan vs India, which would determine whether we change the contents of pa at that point. It sounds like, from what you say, that the bulk of the population of pa users (measured as literate, first or second language speakers) would be in India, while Pakistan would be more lah users than pa users. Is that the case?

Mark

On 3/14/07, sukhjinder_sidhu@hotmail.com < sukhjinder_sidhu@hotmail.com> wrote:
> The current plan is to map "Eastern Panjabi" onto "Panjabi", and all
> the others onto "Lahnda".  It sounds like you are proposing to map
> both "Eastern Panjabi" and "Western Panjabi" onto "Panjabi"
> and the others onto "Lahnda".  Is that correct?

No, I'm saying that a significant number (i.e. millions) of people speak
"Eastern" Punjabi in Pakistan.  The only major difference between this
spoken language and that spoken in Indian Punjab is the written script ( i.e.
Shahmukhi/Gurmukhi).  The sources that say that Western Punjabi and Eastern
Punjabi suddenly stop at the border are simply wrong.  The Maajhi dialect is
centred on the border - leading to significant portions of the population
speaking it on either side.  Hence the need for both 'pa-IN' and 'pa-PK'.

"Grierson [in the Linguistic Survey of India] defined Western Punjabi as
being west of a line running north-south from Montgomery and Gujranwala
districts." (This is directly from Wikipedia, but I was the original
contributor for this in the article. If you wish, I can probably track down
the exact page number for this, but I don't have it at hand at the moment.)
This is well within present day Pakistan, and Lahore alone has nearly seven
million people.

Anyway, that is my take on things.  I would be interested to see what NU
professors/researchers think on this because I've never been to Pakistan so
I could be way off base here.

Regards,
Sukhjinder Sidhu




--
Mark ------=_Part_69371_24562262.1173918219410-- From rick@unicode.org Tue Mar 20 09:19:40 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 20 Mar 2007 09:25:06 -0600 (CST) Received: from izanami (c-67-188-204-169.hsd1.ca.comcast.net [67.188.204.169]) by unicode.org (8.13.4/8.12.11) with SMTP id l2KFJSJn032330; Tue, 20 Mar 2007 09:19:28 -0600 Message-Id: <200703201519.l2KFJSJn032330@unicode.org> To: unicode@unicode.org Subject: Call for Participation: 31st Internationalization & Unicode Conference Date: Tue, 20 Mar 2007 07:19:09 -0800 From: Rick McGowan mime-version: 1.0 (Apple Message framework v95.2) content-type: text/plain; charset=utf-8 received: by Apple.Mailer (2.95.2) X-archive-position: 35 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users FOR IMMEDIATE RELEASE Contact: Stephanie Covert Object Management Group +1-843-737 0637 info@unicodeconference.org Call for Participation: 31st Internationalization & Unicode Conference San Jose, Calif., USA; October 15-17, 2007 Mountain View, CA, USA – March 20, 2007 – The Unicode® Consortium today announced a call for participation in The Thirty-first Internationalization & Unicode® Conference (IUC), taking place in San Jose, Calif., USA; October 15-17, 2007, sponsored by Gold Sponsor Adobe Systems (www.adobe.com). The call for participation runs until Monday, April 23. The annual conference is produced by The Object Management Groupâ„ ¢ (OMGâ„¢). Details about the conference and the call for participation are available at http://www.unicodeconference.org/iuc31call. The Internationalization & Unicode Conference is the premier annual technical conference focusing on multilingual, global software and Web internationalization. Each IUC conference features a variety of tutorials and conference sessions that cover current topics related to Web and software internationalization, globalization, and Unicode. IUC 31 will include sessions with a special focus on compliance, conformance testing and related topics. Organizations, including libraries and universities are specifically invited to submit case studies on their globalization efforts. The conference Program Committee is also seeking technical and business-focused presentations on case studies, experience reports, evaluations or research papers on topics relevant to (but not limited to): * New and upcoming technologies * Implementation of Unicode * Unicode conformance and international standards compliance issues * Common Locale Data Repository (CLDR) * Internationalization or enabling of applications or Web sites * Working with multilingual text and data * Global development best practices * Security and phishing * Business cases and technical issues for globalized software * Publishing and broadcasting for a global audience * Encoding and Internationalization challenges for governments * Unicode in the library and in university curricula Tutorial Sessions are an important part of the conference. The Program Committee is seeking proposals on topics of interest to general software users, to project and program managers, and to technical attendees who need to build basic knowledge of Unicode and software internationalization. Tutorial topics can also include (but aren’t limited to): * Program and project management of internationalization * Best practices in localization process and technology * Users: making the most of international features in common applications * Unicode and internationalization in programming languages Tutorial presenters receive complimentary registration, an honorarium and two nights lodging. Session presenters receive a fifty percent conference discount and two nights lodging. See the web site for full details and restrictions. Proposals for panel sessions in any of the above areas are also welcomed. Those interested in holding Birds-of-a-Feather meetings should contact Kevin Loughry with their suggestions at loughry@omg.org, +1-781-444 0404. Interested individuals or organizations are invited to submit a brief (up to 600 word) abstract of their proposed conference presentation by Monday, April 23 using this web form: http://www.unicodeconference.org/abstracts. The Program Committee will select presentations for inclusion in the program and notify authors by Monday, May 7. Final presentation materials will be required from all selected presenters by Monday, October 1. The conference agenda will be available by Monday, May 14 and posted at http://www.unicodeconference.org. Sponsorships and exhibit space are available; for more information on sponsoring or exhibiting contact Jon Roussel at jroussel@omg.org, +1-781-444-0404 ext. 106. For all other questions email info@unicodeconference.org. Internationalization and Unicode experts, implementers, clients, teachers and vendors are invited to attend this unique conference. The interactive format makes the Internationalization & Unicode Conference a great place to meet and exchange ideas with leading experts, find out about the needs of potential clients, or get information about new and existing Unicode-enabled products. ### About the Unicode Consortium The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Basis Technology, Denic e. G., Google, Government of India - Ministry of Information Technology, Government of Pakistan - National Language Authority, HP, IBM, Justsystems, Microsoft, Monotype Imaging, Oracle, SAP, Sun Microsystems, Sybase, The University of California at Berkeley, Yahoo, and over 100 Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html. About the Event Producer The Object Management Groupâ„¢ (OMGâ„¢) is the new Event Producer for the Internationalization & Unicode Conferences. The OMG is an open membership, not-for-profit consortium that produces and maintains computer industry specifications for interoperable enterprise applications. Our specifications include MDA®, UML®, CORBA®, MOFâ„¢, XMI® and CWMâ„¢. OMG’s specifications are all available for download by everyone without charge. For more information about OMG, visit us online at http://www.omg.org. Note to editors: Unicode Standard, Unicode and the Unicode Logo are trademarks of Unicode, Inc. Unicode Consortium is a registered trademark of Unicode, Inc. OMG and Object Management Group are trademarks of Object Management Group. All other trademarks are the property of their respective owners. From moyogo@gmail.com Wed Mar 21 07:31:05 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 21 Mar 2007 08:24:13 -0600 (CST) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.185]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2LDV4bv026380 for ; Wed, 21 Mar 2007 07:31:04 -0600 Received: by nf-out-0910.google.com with SMTP id x37so925020nfc for ; Wed, 21 Mar 2007 06:30:58 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=fgn5v63hQexmn+7KjNSd/Hg3bF6ypPePmHgJPCwqCOIMraV8h6KAVW4knTwuAiav1Tq9xaoozAUK8sLQe1hTWq9w3u1rKoLIgxkjP8UYuv2OxEAQyIvQaxo8lhsz3uXWwDpMPtciCchoRYnxqSj91UpG69RCeTZEyCA6pIlnuhw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=RFv+ohtgCUKp8xKknioEb2lF1BYzD46w0Y8I7eHilN8Oz9KRQo6vm4IEOUR0aMEFFwwo7v/6tqubKGVpOlkyAsLonqYwkuAnvEYdLsqrI9gmtvQzps0BjjU6FMlnRcx9s1teGoCURJSfu8XlXxWxNNJLTjQZiYXzckRFd4cA1kA= Received: by 10.82.148.7 with SMTP id v7mr1666423bud.1174483857817; Wed, 21 Mar 2007 06:30:57 -0700 (PDT) Received: by 10.82.125.3 with HTTP; Wed, 21 Mar 2007 06:30:57 -0700 (PDT) Message-ID: <8ebc61110703210630m1db5bca5x45fc9a4c8e99fa8@mail.gmail.com> Date: Wed, 21 Mar 2007 14:30:57 +0100 From: "Denis Jacquerye" To: cldr-users@unicode.org Subject: collation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 36 X-Approved-By: root@unicode.org X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: moyogo@gmail.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Is is possible at this point to define collation for a locale? Thank you -- Denis Moyogo Jacquerye From mark.edward.davis@gmail.com Wed Mar 21 09:07:02 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 21 Mar 2007 09:07:02 -0600 (CST) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.170]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2LF6ucN013909 for ; Wed, 21 Mar 2007 09:07:01 -0600 Received: by ug-out-1314.google.com with SMTP id o4so375278uge for ; Wed, 21 Mar 2007 08:06:52 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=jPqhsdEb6OVA88hTUEAb5yw4JbHBYItaMPdoIKwlsedjOmQBKg7dxyH4Odfd4HvW562EjJAjvJjPySjRfBq2CNj7QQy1PtS8I4epJncXS5QwFP/TmxXwDf88YYxoySY4h/mq9RfqvGGKx3IhMXTQssyAvmMleabGqpJvRVM0LGI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=VLSil9MNyC6RP53bc2Okx3+HqvLVwzCNu2dWYTS59VudyN0QT73ZQOoM43AQNrXqLnd2WzpzkIKK+f0AmWKZRyjOtaDo9oLa1y4QOc7sQZJOeNo0iTsdMeks7MuFQXJXOdfumAHYwjrxTAMww/hiXz5gqlBBsSftlMB2PHnk0eE= Received: by 10.115.78.1 with SMTP id f1mr189816wal.1174489610233; Wed, 21 Mar 2007 08:06:50 -0700 (PDT) Received: by 10.114.196.2 with HTTP; Wed, 21 Mar 2007 08:06:50 -0700 (PDT) Message-ID: <30b660a20703210806h2f91f6e6w412b15080d91502f@mail.gmail.com> Date: Wed, 21 Mar 2007 08:06:50 -0700 From: "Mark Davis" To: cldr-users@unicode.org Subject: Re: collation In-Reply-To: <8ebc61110703210630m1db5bca5x45fc9a4c8e99fa8@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_208525_18147444.1174489610077" References: <8ebc61110703210630m1db5bca5x45fc9a4c8e99fa8@mail.gmail.com> X-Google-Sender-Auth: cf85effbd64343d9 X-archive-position: 37 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users ------=_Part_208525_18147444.1174489610077 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64 Content-Disposition: inline WWVzLCBpdCBpcy4gVGhlcmUgYXJlIGFscmVhZHkgcXVpdGUgYSBudW1iZXIgb2YgY29sbGF0aW9u cyBmb3IgbG9jYWxlcywKd2hpY2ggeW91IGNhbiBzZWUgaW4gaHR0cDovL3d3dy51bmljb2RlLm9y Zy9jbGRyL2RhdGEvY29tbW9uL2NvbGxhdGlvbi8sCndpdGggdGVzdCBkYXRhIGluY2x1ZGVkIGlu IGh0dHA6Ly93d3cudW5pY29kZS5vcmcvY2xkci9kYXRhL2NvbW1vbi90ZXN0Ly4KVGhlc2UgYXJl IGluIHRoZSBzb3VyY2UgWE1MLCBkZXNjcmliZWQgaW4KaHR0cDovL3d3dy51bmljb2RlLm9yZy9y ZXBvcnRzL3RyMzUvI0NvbGxhdGlvbl9FbGVtZW50cy4gV2UgZG9uJ3QgeWV0IHNob3cKdGhlIGN1 cnJlbnQgY29sbGF0aW9ucyBpbiB0aGUgc3VydmV5IHRvb2wsIGJ1dCB3ZSBhcmUgd29ya2luZyBv biBzaG93aW5nCmVhY2ggb25lIGF0IGxlYXN0IGFzIGEgY2hhcnQuCgpZb3UgY2FuIHN1Ym1pdCBj b2xsYXRpb24gcnVsZXMgZm9yIGEgbG9jYWxlIHdpdGggdGhlIENMRFIgYnVnIGZvcm0gaW4gZWl0 aGVyCm9mIHR3byBmb3JtYXRzOyB0aGUgSUNVIGZvcm1hdCBvciB0aGUgTERNTCBmb3JtYXQgKHRo ZXJlIGlzIGEKc3RyYWlnaHRmb3J3YXJkIG1hcHBpbmcgYmV0d2VlbiB0aGVtKS4gQW4gZXhhbXBs ZSBvZiBhIGJ1ZyBmaWxlZCByZWNlbnRseSwKZm9yIGV4YW1wbGUsIGlzCmh0dHA6Ly93d3cudW5p Y29kZS5vcmcvY2xkci9idWdzL2xvY2FsZS1idWdzL2RhdGE/ZmluZGlkPTEzMDguIFRoZXJlIGFy ZQpzb21lIHBvaW50ZXJzIG9uIGh0dHA6Ly93d3cudW5pY29kZS5vcmcvY2xkci9maWxpbmdfYnVn X3JlcG9ydHMuaHRtbC4KCldpdGggdGhlIElDVSBydWxlIGZvcm1hdCB5b3UgY2FuIHRlc3QgeW91 ciBjb2xsYXRpb24gdXNpbmcKaHR0cDovL2RlbW8uaWN1LXByb2plY3Qub3JnL2ljdS1iaW4vbG9j ZXhwP189cm9vdCZ4PWNvbC4gRm9yIGV4YW1wbGUsIG9uCnRoYXQgcGFnZSB5b3UgY2FuIHB1dCBp bgomYyA8IGIgPDw8IEIKaW4gdGhlIEN1c3RvbSBSdWxlcyBib3gsIGhpdCBzb3J0LCBhbmQgdGhl IENvbGxhdGVkIGNvbHVtbiB3aWxsIHNob3cgdGhlCnJlc3VsdCAoc29ydGluZyBiIGFuZCBCIGFm dGVyIGMpLgoKU2ltaWxhcmx5LCB0cmFuc2xpdGVyYXRpb25zICh3aGljaCBkbyBoYXZlIGNoYXJ0 cywgb24KaHR0cDovL3d3dy51bmljb2RlLm9yZy9jbGRyL2RhdGEvY2hhcnRzL3RyYW5zZm9ybXMv aW5kZXguaHRtbCksIGNhbiBiZQp0ZXN0ZWQgb24gaHR0cDovL2RlbW8uaWN1LXByb2plY3Qub3Jn L2ljdS1iaW4vdHJhbnNsaXQuIEZvciBleGFtcGxlLCBoaXQKRWRpdCBSdWxlcywgYW5kIGluIHRo ZSBuZXcgd2luZG93IHRoYXQgcG9wcyB1cCwgZW50ZXIKdGgg4oaUIM64OwpUSCDihpQgzpg7ClRo IOKGkiDOmDsKYW5kIHNhdmUgYXMgTGF0aW4tR3JlZWsvZGVtby4gQmFjayBpbiB0aGUgbWFpbiB0 cmFuc2Zvcm0gd2luZG93LCBwdXQgVEggaW4KSW5wdXQsIGFuZCBMYXRpbi1HcmVlay9kZW1vIGlu IENvbXBvdW5kIDEgYW5kIGhpdCBUcmFuc2Zvcm0uCgpNYXJrCgoKTWFyawoKT24gMy8yMS8wNywg RGVuaXMgSmFjcXVlcnllIDxtb3lvZ29AZ21haWwuY29tPiB3cm90ZToKPgo+IElzIGlzIHBvc3Np YmxlIGF0IHRoaXMgcG9pbnQgdG8gZGVmaW5lIGNvbGxhdGlvbiBmb3IgYSBsb2NhbGU/Cj4KPiBU aGFuayB5b3UKPiAtLQo+IERlbmlzIE1veW9nbyBKYWNxdWVyeWUKPgo+CgoKLS0gCk1hcmsK ------=_Part_208525_18147444.1174489610077 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: base64 Content-Disposition: inline WWVzLCBpdCBpcy4gVGhlcmUgYXJlIGFscmVhZHkgcXVpdGUgYSBudW1iZXIgb2YgY29sbGF0aW9u cyBmb3IgbG9jYWxlcywgd2hpY2ggeW91IGNhbiBzZWUgaW4gPGEgaHJlZj0iaHR0cDovL3d3dy51 bmljb2RlLm9yZy9jbGRyL2RhdGEvY29tbW9uL2NvbGxhdGlvbi8iPmh0dHA6Ly93d3cudW5pY29k ZS5vcmcvY2xkci9kYXRhL2NvbW1vbi9jb2xsYXRpb24vPC9hPiwgd2l0aCB0ZXN0IGRhdGEgaW5j bHVkZWQgaW4gCjxhIGhyZWY9Imh0dHA6Ly93d3cudW5pY29kZS5vcmcvY2xkci9kYXRhL2NvbW1v bi90ZXN0LyI+aHR0cDovL3d3dy51bmljb2RlLm9yZy9jbGRyL2RhdGEvY29tbW9uL3Rlc3QvPC9h Pi4gVGhlc2UgYXJlIGluIHRoZSBzb3VyY2UgWE1MLCBkZXNjcmliZWQgaW4gPGEgaHJlZj0iaHR0 cDovL3d3dy51bmljb2RlLm9yZy9yZXBvcnRzL3RyMzUvI0NvbGxhdGlvbl9FbGVtZW50cyI+aHR0 cDovL3d3dy51bmljb2RlLm9yZy9yZXBvcnRzL3RyMzUvI0NvbGxhdGlvbl9FbGVtZW50cwo8L2E+ LiBXZSBkb24mIzM5O3QgeWV0IHNob3cgdGhlIGN1cnJlbnQgY29sbGF0aW9ucyBpbiB0aGUgc3Vy dmV5IHRvb2wsIGJ1dCB3ZSBhcmUgd29ya2luZyBvbiBzaG93aW5nIGVhY2ggb25lIGF0IGxlYXN0 IGFzIGEgY2hhcnQuPGJyPjxicj5Zb3UgY2FuIHN1Ym1pdCBjb2xsYXRpb24gcnVsZXMgZm9yIGEg bG9jYWxlIHdpdGggdGhlIENMRFIgYnVnIGZvcm0gaW4gZWl0aGVyIG9mIHR3byBmb3JtYXRzOyB0 aGUgSUNVIGZvcm1hdCBvciB0aGUgTERNTCBmb3JtYXQgKHRoZXJlIGlzIGEgc3RyYWlnaHRmb3J3 YXJkIG1hcHBpbmcgYmV0d2VlbiB0aGVtKS4gQW4gZXhhbXBsZSBvZiBhIGJ1ZyBmaWxlZCByZWNl bnRseSwgZm9yIGV4YW1wbGUsIGlzIAo8YSBocmVmPSJodHRwOi8vd3d3LnVuaWNvZGUub3JnL2Ns ZHIvYnVncy9sb2NhbGUtYnVncy9kYXRhP2ZpbmRpZD0xMzA4Ij5odHRwOi8vd3d3LnVuaWNvZGUu b3JnL2NsZHIvYnVncy9sb2NhbGUtYnVncy9kYXRhP2ZpbmRpZD0xMzA4PC9hPi4gVGhlcmUgYXJl IHNvbWUgcG9pbnRlcnMgb24gPGEgaHJlZj0iaHR0cDovL3d3dy51bmljb2RlLm9yZy9jbGRyL2Zp bGluZ19idWdfcmVwb3J0cy5odG1sIj4KaHR0cDovL3d3dy51bmljb2RlLm9yZy9jbGRyL2ZpbGlu Z19idWdfcmVwb3J0cy5odG1sPC9hPi48YnI+PGJyPldpdGggdGhlIElDVSBydWxlIGZvcm1hdCB5 b3UgY2FuIHRlc3QgeW91ciBjb2xsYXRpb24gdXNpbmcgPGEgaHJlZj0iaHR0cDovL2RlbW8uaWN1 LXByb2plY3Qub3JnL2ljdS1iaW4vbG9jZXhwP189cm9vdCZhbXA7eD1jb2wiPmh0dHA6Ly9kZW1v LmljdS1wcm9qZWN0Lm9yZy9pY3UtYmluL2xvY2V4cD9fPXJvb3QmYW1wO3g9Y29sCjwvYT4uIEZv ciBleGFtcGxlLCBvbiB0aGF0IHBhZ2UgeW91IGNhbiBwdXQgaW4gPGJyPjxkaXYgc3R5bGU9Im1h cmdpbi1sZWZ0OiA0MHB4OyI+JmFtcDtjICZsdDsgYiAmbHQ7Jmx0OyZsdDsgQjxicj48L2Rpdj5p biB0aGUgQ3VzdG9tIFJ1bGVzIGJveCwgaGl0IHNvcnQsIGFuZCB0aGUgQ29sbGF0ZWQgY29sdW1u IHdpbGwgc2hvdyB0aGUgcmVzdWx0IChzb3J0aW5nIGIgYW5kIEIgYWZ0ZXIgYykuCjxicj48YnI+ U2ltaWxhcmx5LCB0cmFuc2xpdGVyYXRpb25zICh3aGljaCBkbyBoYXZlIGNoYXJ0cywgb24gPGEg aHJlZj0iaHR0cDovL3d3dy51bmljb2RlLm9yZy9jbGRyL2RhdGEvY2hhcnRzL3RyYW5zZm9ybXMv aW5kZXguaHRtbCI+aHR0cDovL3d3dy51bmljb2RlLm9yZy9jbGRyL2RhdGEvY2hhcnRzL3RyYW5z Zm9ybXMvaW5kZXguaHRtbDwvYT4pLCBjYW4gYmUgdGVzdGVkIG9uIDxhIGhyZWY9Imh0dHA6Ly9k ZW1vLmljdS1wcm9qZWN0Lm9yZy9pY3UtYmluL3RyYW5zbGl0Ij4KaHR0cDovL2RlbW8uaWN1LXBy b2plY3Qub3JnL2ljdS1iaW4vdHJhbnNsaXQ8L2E+LiBGb3IgZXhhbXBsZSwgaGl0IEVkaXQgUnVs ZXMsIGFuZCBpbiB0aGUgbmV3IHdpbmRvdyB0aGF0IHBvcHMgdXAsIGVudGVyPGJyPjxkaXYgc3R5 bGU9Im1hcmdpbi1sZWZ0OiA0MHB4OyI+dGgg4oaUIM64Ozxicj5USCDihpQgzpg7PGJyPlRoIOKG kiDOmDs8YnI+PC9kaXY+YW5kIHNhdmUgYXMgTGF0aW4tR3JlZWsvZGVtby4gQmFjayBpbiB0aGUg bWFpbiB0cmFuc2Zvcm0gd2luZG93LCBwdXQgVEggaW4gSW5wdXQsIGFuZCBMYXRpbi1HcmVlay9k ZW1vIGluIENvbXBvdW5kIDEgYW5kIGhpdCBUcmFuc2Zvcm0uCjxicj48YnI+TWFyazxicj48YnI+ PGJyPk1hcms8YnI+PGJyPjxkaXY+PHNwYW4gY2xhc3M9ImdtYWlsX3F1b3RlIj5PbiAzLzIxLzA3 LCA8YiBjbGFzcz0iZ21haWxfc2VuZGVybmFtZSI+RGVuaXMgSmFjcXVlcnllPC9iPiAmbHQ7PGEg aHJlZj0ibWFpbHRvOm1veW9nb0BnbWFpbC5jb20iPm1veW9nb0BnbWFpbC5jb208L2E+Jmd0OyB3 cm90ZTo8L3NwYW4+PGJsb2NrcXVvdGUgY2xhc3M9ImdtYWlsX3F1b3RlIiBzdHlsZT0iYm9yZGVy LWxlZnQ6IDFweCBzb2xpZCByZ2IoMjA0LCAyMDQsIDIwNCk7IG1hcmdpbjogMHB0IDBwdCAwcHQg MC44ZXg7IHBhZGRpbmctbGVmdDogMWV4OyI+CklzIGlzIHBvc3NpYmxlIGF0IHRoaXMgcG9pbnQg dG8gZGVmaW5lIGNvbGxhdGlvbiBmb3IgYSBsb2NhbGU/PGJyPjxicj5UaGFuayB5b3U8YnI+LS08 YnI+RGVuaXMgTW95b2dvIEphY3F1ZXJ5ZTxicj48YnI+PC9ibG9ja3F1b3RlPjwvZGl2Pjxicj48 YnIgY2xlYXI9ImFsbCI+PGJyPi0tIDxicj5NYXJrCg== ------=_Part_208525_18147444.1174489610077-- From mark.edward.davis@gmail.com Wed Mar 21 09:24:28 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 21 Mar 2007 09:24:28 -0600 (CST) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.171]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2LFORRl020803 for ; Wed, 21 Mar 2007 09:24:27 -0600 Received: by ug-out-1314.google.com with SMTP id o4so382569uge for ; Wed, 21 Mar 2007 08:24:21 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=FFBTU20Wi8Xy02UkCLTD+K7ATyri8WjpVy5JKXDT1uk8QQd78niog2bfenhkrPyasCE1S/Da2LPX8XiuTpkmpmEqp1RkoC3fIdmuj9jMZYhe879TeE8IGEVm/DXtF2dE4+RtPKd64W3mt18Mx1f7Hx5yxMCTJhkAfFZnr2avXbU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=Y3IE7MomuK5+rTxtSAAN7p4l1X4qgdNrfaacNo719mDf4g/x47XyRPE8l76od0GxHDsWzwveQAyBlCbFlMQl3VweqV9L1hCnTRHvgaq7TKQB2SQdAsjgZxVQ3dR4dYvCGholhJqUm9s/q68vM4IXaPkjYC92oyWSQhEgwqCAqSY= Received: by 10.115.17.1 with SMTP id u1mr196116wai.1174490659470; Wed, 21 Mar 2007 08:24:19 -0700 (PDT) Received: by 10.114.196.2 with HTTP; Wed, 21 Mar 2007 08:24:19 -0700 (PDT) Message-ID: <30b660a20703210824m7e1050b3ndebee711785f9f60@mail.gmail.com> Date: Wed, 21 Mar 2007 08:24:19 -0700 From: "Mark Davis" To: cldr-users@unicode.org Subject: Test Locale MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_209073_17883994.1174490659172" X-Google-Sender-Auth: a804710d4f3ef85e X-archive-position: 38 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users ------=_Part_209073_17883994.1174490659172 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline We've added a test locale, http://unicode.org/cldr/apps/survey?_=und (listed under "Unknown" in http://unicode.org/cldr/apps/survey), which can be used by anyone with an account. The data there will be scrapped, so you can try test out adding data without cluttering up any real locale data. -- Mark ------=_Part_209073_17883994.1174490659172 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline We've added a test locale, http://unicode.org/cldr/apps/survey?_=und (listed under "Unknown" in http://unicode.org/cldr/apps/survey ), which can be used by anyone with an account. The data there will be scrapped, so you can try test out adding data without cluttering up any real locale data.

--
Mark ------=_Part_209073_17883994.1174490659172-- From moyogo@gmail.com Thu Mar 22 16:51:14 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 22 Mar 2007 16:51:14 -0600 (CST) Received: from mu-out-0910.google.com (mu-out-0910.google.com [209.85.134.191]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2MMp9I6020859 for ; Thu, 22 Mar 2007 16:51:13 -0600 Received: by mu-out-0910.google.com with SMTP id w9so1267862mue for ; Thu, 22 Mar 2007 15:51:05 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=cRqoyk5a8L4ED6K4RxMFD149CBXRyX2KU6mGFEuDrjSQDucS5gYPYDxxjgkg384GZm9TOd092XMHB+B8/ewwdvH/h/cSF8+9cs+dDa+0EypWLdHn7EYxWcuZM/ktZS0EGfMr+TOac2x0pXuxM4B6/iP5fFn0zaihqxe00zrLwxw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Yu7xw0Y2lGVq/pBHCFgVt8QOy3nWUr9VtsIIYuIl+12rk1ALfoB7ABBBAXarIjwHXHRyoc3it4iH9O7x0KGnsVg6nv4UBq2mQ638uQRSOlLdGvEYWO03MeH57J1jgRK1cSRtc2NdYeNjRPTr0jKrHbB0rxsUvofyCVSGmHL/bV4= Received: by 10.82.136.4 with SMTP id j4mr5116428bud.1174603864940; Thu, 22 Mar 2007 15:51:04 -0700 (PDT) Received: by 10.82.125.3 with HTTP; Thu, 22 Mar 2007 15:51:04 -0700 (PDT) Message-ID: <8ebc61110703221551p55e79f71ye7fc9bf9626db9d1@mail.gmail.com> Date: Thu, 22 Mar 2007 23:51:04 +0100 From: "Denis Jacquerye" To: cldr-users@unicode.org Subject: Re: collation In-Reply-To: <30b660a20703210806h2f91f6e6w412b15080d91502f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <8ebc61110703210630m1db5bca5x45fc9a4c8e99fa8@mail.gmail.com> <30b660a20703210806h2f91f6e6w412b15080d91502f@mail.gmail.com> X-archive-position: 39 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: moyogo@gmail.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users On 3/21/07, Mark Davis wrote: > Yes, it is. There are already quite a number of collations for locales, > which you can see in > http://www.unicode.org/cldr/data/common/collation/, with > test data included in > http://www.unicode.org/cldr/data/common/test/. These are in > the source XML, described in > http://www.unicode.org/reports/tr35/#Collation_Elements . > We don't yet show the current collations in the survey tool, but we are > working on showing each one at least as a chart. > > You can submit collation rules for a locale with the CLDR bug form in either > of two formats; the ICU format or the LDML format (there is a > straightforward mapping between them). An example of a bug filed recently, > for example, is > http://www.unicode.org/cldr/bugs/locale-bugs/data?findid=1308. > There are some pointers on > http://www.unicode.org/cldr/filing_bug_reports.html. Thank you Mark. I'll get to it. I also would like to know how collation variants could be defined, I mean how it can be approved. The bantu language I work with, Lingala, can have it's digrams and trigrams counted as characters of their own or not, i.e., some dictionaries sort "no" before "ngo" since 'ng' is a digram following 'n' while others only use single letters to sort. Thank you in advance. Denis Moyogo Jacquerye From mark.edward.davis@gmail.com Mon Mar 26 15:21:53 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 26 Mar 2007 15:21:54 -0600 (CST) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.169]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2QLLpIk018743 for ; Mon, 26 Mar 2007 15:21:52 -0600 Received: by ug-out-1314.google.com with SMTP id o4so1858180uge for ; Mon, 26 Mar 2007 14:21:46 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=iKI6E1LhVJV6LC9N98V1tp3he9a7Rdpbb1XTcORw1nb3KOpiJ8vp65xIybw3Z4o74+rnxG2xLx4jQF4l03GH+aD0kYeamBpVPbmV/dY3czPbIYugPoaUrDPSBvITlb+M2/V86+hQ9AAZaPK3ONIeQtju42shEnIFekcKy1aqqns= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=t4oBtmJFkahpoESxMtYdsUsZsbQ2qLzRJZbl4ctmB0AkHybJq9j9/M9UBBEp4woDf0okDMixnVCLNxM3JP4G/29Is9q8x8jqXnsaOCqb/3HWPFeXpS7OtXDcr+YiQutf8fFQ97Sdir2JDSNby67Rl/hggvJ05cSeKkusgopUb9Q= Received: by 10.115.76.1 with SMTP id d1mr2850437wal.1174944104798; Mon, 26 Mar 2007 14:21:44 -0700 (PDT) Received: by 10.114.196.2 with HTTP; Mon, 26 Mar 2007 14:21:44 -0700 (PDT) Message-ID: <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> Date: Mon, 26 Mar 2007 14:21:44 -0700 From: "Mark Davis" To: "Deborah Goldsmith" Subject: Re: Concerns about relative dates Cc: "CLDR list" , cldr-users@unicode.org In-Reply-To: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_312113_32582906.1174944104667" References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> X-Google-Sender-Auth: a7990ffe02ce6a85 X-archive-position: 40 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users ------=_Part_312113_32582906.1174944104667 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline We put a note in the zoomed view about this, such as on http://unicode.org/cldr/apps/survey?_=af&forum=af&xpath=85028 If there are other ways you (or others) can think of to notify users, we can discuss them. Also CC'ing the public list in case people there have suggestions. Mark On 3/26/07, Deborah Goldsmith wrote: > > Hi, > > I see that English has "the day before yesterday" and "the day after > tomorrow". Those would not normally be used in a real application, > and I'm worried we'll get similarly unnatural translations in some > languages. The goal for this data is to get translations that are > single terms, not phrases. > > How are we going to deal with the fact that different languages have > different ranges for the "natural" terms in this area? For example, > the natural range for English is -1, 0, +1, but for Japanese it's -2, > -1, 0, +1, +2. I don't think any language has -3 or +3. > > Once all the data is in, how are we going to make sure that phrases > like "the day before yesterday" are not included in the final release? > > Deborah > > > -- Mark ------=_Part_312113_32582906.1174944104667 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline We put a note in the zoomed view about this, such as on http://unicode.org/cldr/apps/survey?_=af&forum=af&xpath=85028

If there are other ways you (or others) can think of to notify users, we can discuss them.

Also CC'ing the public list in case people there have suggestions.

Mark

On 3/26/07, Deborah Goldsmith < goldsmit@apple.com> wrote:
Hi,

I see that English has "the day before yesterday" and "the day after
tomorrow". Those would not normally be used in a real application,
and I'm worried we'll get similarly unnatural translations in some
languages. The goal for this data is to get translations that are
single terms, not phrases.

How are we going to deal with the fact that different languages have
different ranges for the "natural" terms in this area? For example,
the natural range for English is -1, 0, +1, but for Japanese it's -2,
-1, 0, +1, +2. I don't think any language has -3 or +3.

Once all the data is in, how are we going to make sure that phrases
like "the day before yesterday" are not included in the final release?

Deborah





--
Mark ------=_Part_312113_32582906.1174944104667-- From Philip.Pizzey@sita.aero Mon Mar 26 21:01:46 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Mon, 26 Mar 2007 21:01:46 -0600 (CST) Received: from mx5.sita.aero (mx5.sita.aero [57.250.243.10]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2R31j5f025994 for ; Mon, 26 Mar 2007 21:01:46 -0600 Received: from londms02.corp.sita.aero (londms02.corp.sita.aero [57.5.24.28]) by mx5.sita.aero with ESMTP id l2R31d12014498 for ; Tue, 27 Mar 2007 03:01:40 GMT Subject: Philip Pizzey/London/SITA/WW is out of the office. From: Philip.Pizzey@sita.aero To: cldr-users@unicode.org Message-ID: Date: Tue, 27 Mar 2007 04:02:57 +0100 X-MIMETrack: Serialize by Router on LONDMS02/MAIL/SITA/WW(Release 7.0.2 HF121|November 20, 2006) at 27/03/2007 04:03:09 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-Disposition: inline X-Scanned-By: MIMEDefang 2.57 on 57.250.243.10 X-archive-position: 41 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: Philip.Pizzey@sita.aero Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users I will be out of the office starting 26/03/2007 and will not return until 02/04/2007. I will respond to your message when I return. This document is strictly confidential and intended only for use by the addressee unless otherwise stated. If you are not the intended recipient, please notify the sender immediately and delete it from your system. From duerst@it.aoyama.ac.jp Tue Mar 27 04:06:40 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 04:06:40 -0600 (CST) Received: from scmailgw1.scop.aoyama.ac.jp (scmailgw1.scop.aoyama.ac.jp [133.2.251.194]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2RA6cF7003117 for ; Tue, 27 Mar 2007 04:06:39 -0600 Received: from scmse2.scbb.aoyama.ac.jp (scmse2 [133.2.253.17]) by scmailgw1.scop.aoyama.ac.jp (secret/secret) with SMTP id l2RA6WoM003637 for ; Tue, 27 Mar 2007 19:06:32 +0900 (JST) Received: from (133.2.206.133) by scmse2.scbb.aoyama.ac.jp via smtp id 1b9e_d6bbc6fc_dc4a_11db_808f_0014221f2a2d; Tue, 27 Mar 2007 19:06:32 +0900 X-AuthUser: duerst@it.aoyama.ac.jp Received: from Tanzawa.it.aoyama.ac.jp ([133.2.210.1]:56792) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id for from ; Tue, 27 Mar 2007 19:05:27 +0900 Message-Id: <6.0.0.20.2.20070327184827.077222d0@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Version 6J Date: Tue, 27 Mar 2007 19:05:29 +0900 To: cldr-users@unicode.org, "Deborah Goldsmith" From: Martin Duerst Subject: Re: Concerns about relative dates Cc: "CLDR list" , cldr-users@unicode.org In-Reply-To: <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.co m> References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit X-archive-position: 42 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: duerst@it.aoyama.ac.jp Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users >On 3/26/07, Deborah Goldsmith < goldsmit@apple.com> wrote: >>I see that English has "the day before yesterday" and "the day after >>tomorrow". Those would not normally be used in a real application, >>and I'm worried we'll get similarly unnatural translations in some >>languages. The goal for this data is to get translations that are >>single terms, not phrases. >> >>How are we going to deal with the fact that different languages have >>different ranges for the "natural" terms in this area? For example, >>the natural range for English is -1, 0, +1, but for Japanese it's -2, >>-1, 0, +1, +2. I don't think any language has -3 or +3. Japanese definitely has +3: $B$7$"$5$C$F(B (shiasatte) (+2: $B$"$5$C$F(B/asatte; +1: $BL@F|(B/$B$"$7$?(B/asita, also $B$"$9(B/asu in more colloquial speach). I can't recall a term for -3 at the moment, and two of my Japanese students that I just asked can't, either. In German, things are somewhat fluid. +1 is 'Morgen', +2 is 'U"bermorgen', +3 is 'U"beru"bermorgen', +4 would be 'U"beru"beru"bermorgen', and so on. Definitely all just one word, but that was to be expected in the case of German :-). Same for the other direction, which works even better, because 'vor' is easier to pronounce than 'u"ber': -3 'Vorvorgestern' (definitely in use), -4 'Vorvorvorgestern' (about the limit, definitely not used all too frequently). Where to draw the boundary between natural and unnatual is quite a bit of a judgement case, it depends on the expected counting abilities of the people communicating. Even in English, "the day before yesterday", while not a single word, is a standing expression. For -3, I guess rarely anybody would say "the day before the day before yesterday", but they would just say "thee days ago". I'm not totally sure what the goal of the collection is for these items, but if somebody wanted to make sure that in some text, it said "the day before yesterday" instead of "two days ago", then we would need to collect these expressions even if they are not single words. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp From sankarshan.mukhopadhyay@gmail.com Tue Mar 27 06:08:13 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 06:08:13 -0600 (CST) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2RC8CJE008411 for ; Tue, 27 Mar 2007 06:08:12 -0600 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.1/8.13.1) with ESMTP id l2RC8CWr014186 for ; Tue, 27 Mar 2007 08:08:12 -0400 Received: from lacrosse.corp.redhat.com (lacrosse.corp.redhat.com [172.16.52.154]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id l2RC8B23023844 for ; Tue, 27 Mar 2007 08:08:11 -0400 Received: from [10.65.1.81] (dhcp1-81.pnq.redhat.com [10.65.1.81]) by lacrosse.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id l2RC8AOk026004 for ; Tue, 27 Mar 2007 08:08:10 -0400 Message-ID: <46090A11.20304@gmail.com> Date: Tue, 27 Mar 2007 17:42:01 +0530 From: Sankarshan Mukhopadhyay User-Agent: Thunderbird 1.5.0.10 (X11/20070302) MIME-Version: 1.0 To: cldr-users@unicode.org Subject: What is the process for getting in corrections to CLDR for a particular locale ? X-Enigmail-Version: 0.94.1.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-archive-position: 43 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: sankarshan.mukhopadhyay@gmail.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Lame question but a question all the same. I am interested in: http://www.unicode.org/cldr/data/charts/main/bn_IN.html#86 http://www.unicode.org/cldr/data/charts/main/bn_IN.html#87 http://www.unicode.org/cldr/data/charts/main/bn_IN.html#98 http://www.unicode.org/cldr/data/charts/main/bn_IN.html#99 especially :Sankarshan - -- You see things; and you say 'Why?'; But I dream things that never were; and I say 'Why not?' - George Bernard Shaw -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFGCQoRXQZpNTcrCzMRAgl3AJ9mdnmfqwuPMV+4yZ6b8ZkBsZRh5wCfUrdI MNCU/hNTep0BybJ8Y+5kYz4= =on/T -----END PGP SIGNATURE----- From write@omiazad.com Tue Mar 27 12:17:15 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 12:17:15 -0600 (CST) Received: from mail3.hostek.com (mail3.hostek.com [216.198.218.131]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2RIHF9S025044 for ; Tue, 27 Mar 2007 12:17:15 -0600 X-SMSpamC: skipped (authenticated sender) Received: from UnknownHost [202.56.7.130] by mail3.hostek.com with SMTP; Tue, 27 Mar 2007 13:15:54 -0500 Message-ID: <46095F22.8020702@omiazad.com> Date: Wed, 28 Mar 2007 00:14:58 +0600 From: Omi Azad User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: cldr-users@unicode.org, core@bengalinux.org Subject: Re: What is the process for getting in corrections to CLDR for a particular locale ? References: <46090A11.20304@gmail.com> In-Reply-To: <46090A11.20304@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 44 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: write@omiazad.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Why to ask in the CLDR list? I think this is a spelling issue, right? There should not be a ী on a foreign sound. Well, there are some many issues with this CLDR. I think I sent them an update almost 2 years back, which have been done with Jamil Bhai's help. But they didn't update this place with that 2 XML file. Later when they opened this site, they ask me to do the chandes, which is quite insane for me, cause I don't know most of the terms, like "persian​_monthWidth​_format​_wide" and there was no space for BD. Again they made BD and gave me access, but right after that the submission became closed and I lost my interest to fix that. Any Suggestion? Omi Sankarshan Mukhopadhyay wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > Lame question but a question all the same. > > I am interested in: > > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#86 > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#87 > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#98 > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#99 > > especially > > :Sankarshan > > From mark.edward.davis@gmail.com Tue Mar 27 12:31:54 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 12:31:54 -0600 (CST) Received: from ik-out-1112.google.com (ik-out-1112.google.com [66.249.90.177]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2RIVrKl029158 for ; Tue, 27 Mar 2007 12:31:54 -0600 Received: by ik-out-1112.google.com with SMTP id c29so2133961ika for ; Tue, 27 Mar 2007 11:31:48 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=NufVbT6/do3eFxl7peYFJRd0RNO9cbmQGKVlOjYfhpBDNd2ULSoFTYtk1EAs/jIGC4Tzn75UAZfoUzGgataPXPK63db/ovX3k5CQ0pELXtECV26kaIJVqbfFY9ijKf4TkNgKB3rZG/1oPS4hx47vSqHKKsJ8AcFq6M5YudyIO44= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=j1wKFT8hhYsPlbZD8HQdKqLvC6EF7nNKRwhjZiIczqrVl09yTpG2mbnUdTEN5VtZrPOo7YmesPow2mF0o9cMOo4CEz6b5VE9U8b8KXH7Uf+xo4/SFDJ0xFBUzf4tl+sxfGJ4XZv/YCQxK7HbfaCIKQwMfG5KFod2RQ4fLEGfVmg= Received: by 10.115.106.7 with SMTP id i7mr3277920wam.1175020306445; Tue, 27 Mar 2007 11:31:46 -0700 (PDT) Received: by 10.114.196.2 with HTTP; Tue, 27 Mar 2007 11:31:45 -0700 (PDT) Message-ID: <30b660a20703271131j65f52721w2d4fca008ad71275@mail.gmail.com> Date: Tue, 27 Mar 2007 11:31:45 -0700 From: "Mark Davis" To: cldr-users@unicode.org Subject: Re: What is the process for getting in corrections to CLDR for a particular locale ? In-Reply-To: <46090A11.20304@gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_17638_25718327.1175020305769" References: <46090A11.20304@gmail.com> X-Google-Sender-Auth: c1cc26ff76dc8fdc X-archive-position: 45 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users ------=_Part_17638_25718327.1175020305769 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline The pages you cited were platform comparison pages. The best way to see what is in CLDR is to use the Survey tool; that is also the way to submit proposed new data. That mechanism allows us to accept and process the thousands of submissions in a reasonable fashion, since we found that doing it with XML or plaintext submissions was untenable. The Survey tool has been substantially enhanced for this release. In terms of schedule, for each release, we get the tooling ready for the release, then have a data submission period, and then a vetting period for the submitted data, then we do the final release preparation. We are limited currently to accepting data during that data submission phase (which is right now until the end of April). - See http://unicode.org/cldr/ for the current release schedule. - For more information, and how to sign up, see http://unicode.org/cldr/survey_tool.html. If you have questions after reading that material, feel free to ask on this list. Mark On 3/27/07, Sankarshan Mukhopadhyay wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > Lame question but a question all the same. > > I am interested in: > > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#86 > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#87 > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#98 > http://www.unicode.org/cldr/data/charts/main/bn_IN.html#99 > > especially > > :Sankarshan > > - -- > > You see things; and you say 'Why?'; > But I dream things that never were; > and I say 'Why not?' - George Bernard Shaw > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (GNU/Linux) > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org > > iD8DBQFGCQoRXQZpNTcrCzMRAgl3AJ9mdnmfqwuPMV+4yZ6b8ZkBsZRh5wCfUrdI > MNCU/hNTep0BybJ8Y+5kYz4= > =on/T > -----END PGP SIGNATURE----- > > -- Mark ------=_Part_17638_25718327.1175020305769 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline The pages you cited were platform comparison pages. The best way to see what is in CLDR is to use the Survey tool; that is also the way to submit proposed new data. That mechanism allows us to accept and process the thousands of submissions in a reasonable fashion, since we found that doing it with XML or plaintext submissions was untenable. The Survey tool has been substantially enhanced for this release.

In terms of schedule, for each release, we get the tooling ready for the release, then have a data submission period, and then a vetting period for the submitted data, then we do the final release preparation. We are limited currently to accepting data during that data submission phase (which is right now until the end of April).
If you have questions after reading that material, feel free to ask on this list.

Mark

On 3/27/07, Sankarshan Mukhopadhyay < sankarshan.mukhopadhyay@gmail.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Lame question but a question all the same.

I am interested in:

http://www.unicode.org/cldr/data/charts/main/bn_IN.html#86
http://www.unicode.org/cldr/data/charts/main/bn_IN.html#87
http://www.unicode.org/cldr/data/charts/main/bn_IN.html#98
http://www.unicode.org/cldr/data/charts/main/bn_IN.html#99

especially

:Sankarshan

- --

You see things; and you say 'Why?';
But I dream things that never were;
and I say 'Why not?' - George Bernard Shaw
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFGCQoRXQZpNTcrCzMRAgl3AJ9mdnmfqwuPMV+4yZ6b8ZkBsZRh5wCfUrdI
MNCU/hNTep0BybJ8Y+5kYz4=
=on/T
-----END PGP SIGNATURE-----




--
Mark ------=_Part_17638_25718327.1175020305769-- From dzo@bisharat.net Tue Mar 27 17:03:19 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 17:03:19 -0600 (CST) Received: from kabissa.org (113166.kabissa.org [72.32.199.201]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2RN3JE6025923 for ; Tue, 27 Mar 2007 17:03:19 -0600 Received: (qmail 22934 invoked from network); 27 Mar 2007 17:04:13 -0500 Received: from host-64-202-138-2.getcosi.com (HELO IBM92AA25595C4) (64.202.138.2) by 72.32.229.137 with SMTP; 27 Mar 2007 17:04:13 -0500 From: "Don Osborn" To: , "'Deborah Goldsmith'" Cc: "'CLDR list'" , References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> <6.0.0.20.2.20070327184827.077222d0@localhost> In-Reply-To: <6.0.0.20.2.20070327184827.077222d0@localhost> Subject: RE: Concerns about relative dates Date: Tue, 27 Mar 2007 18:04:09 -0400 Message-ID: <000a01c770bb$d8ff3e30$8afdba90$@net> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcdwV7EnYx3nZn93RsSdfGyN1VU2hwAB7acA Content-Language: en-us X-archive-position: 46 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: dzo@bisharat.net Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users I assume Deborah's question about relative date references is about proposed fields for locale data, and if so Martin's implied question in response is one that deserves explicit attention. Remembering a discussion a few months ago about adding fields for certain kinds of terms (it might have been gender designations such as one might have in a questionnaire - maybe someone remembers), and a response that locales should not become dictionaries, I still wonder about the perceived need to add more terms to locale data. And indeed about the dynamic at play: the number of fields is more likely to increase than to stay the same unless the purpose is strictly delimited. Where exactly is the line drawn? How exactly would having "the day before yesterday" or shoe sizes (sorry Tex) defined in locale data assist in localizing software or loading a webpage? At what point should we expect folks needing to know how various languages refer to a particular thing or concept to look it up in something other than locale data? (Multilingual dictionaries?) My original conception of the purpose of the locale data was as kind of a "linguistic boot file," so the software will know how to order dates, what character repertoire is necessary, and a limited number of other basic parameters. "Common, necessary software locale data for all world languages" per the CLDR Overview via http://www.unicode.org/cldr/ . Are other purposes now foreseen? Sorry if this is just clueless and thanks in advance for any enlightenment... Don > -----Original Message----- > From: cldr-users-bounce@unicode.org [mailto:cldr-users- > bounce@unicode.org] On Behalf Of Martin Duerst ... > > I'm not totally sure what the goal of the collection is for > these items, but if somebody wanted to make sure that in some > text, it said "the day before yesterday" instead of "two > days ago", then we would need to collect these expressions > even if they are not single words. > > Regards, Martin. > > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University > #-#-# http://www.sw.it.aoyama.ac.jp > mailto:duerst@it.aoyama.ac.jp > From addison@yahoo-inc.com Tue Mar 27 17:58:39 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 17:58:39 -0600 (CST) Received: from rsmtp1.corp.yahoo.com (rsmtp1.corp.yahoo.com [207.126.228.149]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2RNwcmD013554; Tue, 27 Mar 2007 17:58:39 -0600 Received: from [10.72.72.62] (snvvpn1-10-72-72-c62.corp.yahoo.com [10.72.72.62]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l2RNwQwq083694 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 27 Mar 2007 16:58:26 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=vKwQCnNqTgHkNWA8sVimDLvb2uX7WGomVrOqy+mMMZAxSlXxAqvA/eT2n9gCdwli Message-ID: <4609AFA2.5030105@yahoo-inc.com> Date: Tue, 27 Mar 2007 16:58:26 -0700 From: Addison Phillips User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: cldr-users@unicode.org CC: "'Deborah Goldsmith'" , "'CLDR list'" , duerst@it.aoyama.ac.jp Subject: Re: Concerns about relative dates References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> <6.0.0.20.2.20070327184827.077222d0@localhost> <000a01c770bb$d8ff3e30$8afdba90$@net> In-Reply-To: <000a01c770bb$d8ff3e30$8afdba90$@net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 47 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: addison@yahoo-inc.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users > I still wonder about the perceived need to add more terms to > locale data. I must admit to being at least a bit dubious about putting this data into CLDR, but at the same time it is a very common problem. Plenty of UI designers want to have friendly time strings ("## minutes ago", etc.). The number of design problems similar to this one is open-ended and certainly this verges on the "common localization dictionary repository" :-). Still... In programming terms, the typical solution to this type of problem is something similar to a Java ChoiceFormat, in which you have an array of limit values with associated strings. What's nice, from the programmer's point of view, is that the number of resource values doesn't have to be determined at design time. It's very bad if you provide a data structure with room for three things and the next language you encounter turns out to need four... The main problem with "choice format" is that linguists really haven't a clue what to do with these things. They're rarely used and translation tools aren't usually set up to deal with them---usually in translation you end up with exactly the same number of resources, while in "choice format" the idea is that you can have more (or fewer)... and that you might need to modify the values used to pick the string (and not just translate the string). This kind of interaction design related to time crops up so frequently, that one wants to create and (re)use locale data for each of the potential periods (seconds, minutes, hours, days, weeks, months, years, fortnights, moons, sols, etc.). Literally the last email I sent was a pointer to my JavaScript implementation of choice format for timestamps, so I am certainly sensitized to the need for it :-). If these resources are generally available, then there is much less need for the choice format design pattern (you use date formatting instead). And less incentive for people like myself to encourage UI designers to make unfriendly strings like "hours ago: 2". So I guess I'm saying I support this particular case, but also recognize that it is near the "gray zone". [If someone proposes page counts, weights, lengths, byte counts, hat sizes, etc., I'll probably go "ick"] > My original conception of the purpose of the locale data was as kind of a > "linguistic boot file," so the software will know how to order dates, what > character repertoire is necessary, and a limited number of other basic > parameters. One use for locales is to transform "objects" (things such as a time or a number) between their computer representation and a human-oriented representation. We are used to expecting that a "locale" results in some date strings including a sequence of characters like "January" or "Dezember". This, necessarily, is indistinguishable from generating some human language from a digital representation (e.g. number of seconds since January 1, 1980, midnight, UTC). As such, the problem isn't "where to start", but where to draw the line between what is baked into our operating environment (OS, programming languages, etc.) and what we build for ourselves in our applications (programs, web sites, documents, etc.). There are many things, of course, that have computer representations. The question is how common the need for a particular general purpose representation is and when one person or group's design decisions begin to affect the usefulness of the whole structure (by negatively impacting the separate design decisions of some other person or group of users). I doubt there is or can be a bright line defining that decision. FWIW, Addison -- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. Don Osborn wrote: > I assume Deborah's question about relative date references is about proposed > fields for locale data, and if so Martin's implied question in response is > one that deserves explicit attention. Remembering a discussion a few months > ago about adding fields for certain kinds of terms (it might have been > gender designations such as one might have in a questionnaire - maybe > someone remembers), and a response that locales should not become > dictionaries, I still wonder about the perceived need to add more terms to > locale data. And indeed about the dynamic at play: the number of fields is > more likely to increase than to stay the same unless the purpose is strictly > delimited. > > Where exactly is the line drawn? How exactly would having "the day before > yesterday" or shoe sizes (sorry Tex) defined in locale data assist in > localizing software or loading a webpage? At what point should we expect > folks needing to know how various languages refer to a particular thing or > concept to look it up in something other than locale data? (Multilingual > dictionaries?) > > My original conception of the purpose of the locale data was as kind of a > "linguistic boot file," so the software will know how to order dates, what > character repertoire is necessary, and a limited number of other basic > parameters. "Common, necessary software locale data for all world languages" > per the CLDR Overview via http://www.unicode.org/cldr/ . Are other purposes > now foreseen? > > Sorry if this is just clueless and thanks in advance for any > enlightenment... > > Don > > >> -----Original Message----- >> From: cldr-users-bounce@unicode.org [mailto:cldr-users- >> bounce@unicode.org] On Behalf Of Martin Duerst > ... >> I'm not totally sure what the goal of the collection is for >> these items, but if somebody wanted to make sure that in some >> text, it said "the day before yesterday" instead of "two >> days ago", then we would need to collect these expressions >> even if they are not single words. >> >> Regards, Martin. >> >> >> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University >> #-#-# http://www.sw.it.aoyama.ac.jp >> mailto:duerst@it.aoyama.ac.jp >> > > > > From asmusf@ix.netcom.com Tue Mar 27 18:12:47 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 18:12:48 -0600 (CST) Received: from elasmtp-banded.atl.sa.earthlink.net (elasmtp-banded.atl.sa.earthlink.net [209.86.89.70]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2S0ClGi019434; Tue, 27 Mar 2007 18:12:47 -0600 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=tYJlYKPqsuo9pAfjmwxwH7aA1if4ByyM3ztrYhn+bAcHN7cH+A4BL370nFaGyQ/2; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP; Received: from adsl-67-112-18-34.dsl.snfc21.pacbell.net ([67.112.18.34] helo=[127.0.0.1]) by elasmtp-banded.atl.sa.earthlink.net with asmtp (Exim 4.34) id 1HWLmK-0000VU-LH; Tue, 27 Mar 2007 20:12:40 -0400 Message-ID: <4609B2F3.8090807@ix.netcom.com> Date: Tue, 27 Mar 2007 17:12:35 -0700 From: Asmus Freytag User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: cldr-users@unicode.org CC: "'Deborah Goldsmith'" , "'CLDR list'" , duerst@it.aoyama.ac.jp Subject: Re: Concerns about relative dates References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> <6.0.0.20.2.20070327184827.077222d0@localhost> <000a01c770bb$d8ff3e30$8afdba90$@net> In-Reply-To: <000a01c770bb$d8ff3e30$8afdba90$@net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-ELNK-Trace: 464f085de979d7246f36dc87813833b22c120543388a5fd7e85084c95e5420b1c73fd6c68d9a234d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 67.112.18.34 X-archive-position: 48 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmusf@ix.netcom.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users On 3/27/2007 3:04 PM, Don Osborn wrote: > I assume Deborah's question about relative date references is about proposed > fields for locale data, and if so Martin's implied question in response is > one that deserves explicit attention. Remembering a discussion a few months > ago about adding fields for certain kinds of terms (it might have been > gender designations such as one might have in a questionnaire - maybe > someone remembers), and a response that locales should not become > dictionaries, I still wonder about the perceived need to add more terms to > locale data. And indeed about the dynamic at play: the number of fields is > more likely to increase than to stay the same unless the purpose is strictly > delimited. > > Where exactly is the line drawn? How exactly would having "the day before > yesterday" or shoe sizes (sorry Tex) defined in locale data assist in > localizing software or loading a webpage? At what point should we expect > folks needing to know how various languages refer to a particular thing or > concept to look it up in something other than locale data? (Multilingual > dictionaries?) > > My original conception of the purpose of the locale data was as kind of a > "linguistic boot file," so the software will know how to order dates, what > character repertoire is necessary, and a limited number of other basic > parameters. "Common, necessary software locale data for all world languages" > per the CLDR Overview via http://www.unicode.org/cldr/ . Are other purposes > now foreseen? > I always thought that the implicit limitation of a locale had to do with aspects of the presentation that have to be automatically generated in response to numeric information of certain kinds (including date, time and duration). Extending this from absolute notation to relative notation at least appears to be well-defined. It follows the implicit goal of matching the way people insist one must refer to things: my newspaper's weather forecast uses the 'today' label, even though that means that I have to check the tiny date in the header to make sure I'm not reading something that's out of date - however, for live data, that kind of paradigm makes sense. Shoe size is in a different and very specific domain from time, quantity and currency, the main three 'generic' domains that i18n has attempted to cover from the beginning. Drawing a bright line there is useful. Martin's examples point out the need to define which of these expressions are used in ordinary *written* communication. While he's correct that in German you can use agglomeration (similar to the great-great-....-grandfather style of genealogical reference) any use of expressions beyond the single prefix terms "vorgestern" (day before yesterday) or "übermorgen" (day after tomorrow) are properly limited to spoken communication of the more informal kind. A./ > Sorry if this is just clueless and thanks in advance for any > enlightenment... > > Don > > > >> -----Original Message----- >> From: cldr-users-bounce@unicode.org [mailto:cldr-users- >> bounce@unicode.org] On Behalf Of Martin Duerst >> > ... > >> I'm not totally sure what the goal of the collection is for >> these items, but if somebody wanted to make sure that in some >> text, it said "the day before yesterday" instead of "two >> days ago", then we would need to collect these expressions >> even if they are not single words. >> >> Regards, Martin. >> >> >> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University >> #-#-# http://www.sw.it.aoyama.ac.jp >> mailto:duerst@it.aoyama.ac.jp >> >> > > > > > > > From goldsmit@apple.com Tue Mar 27 18:19:03 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 18:19:13 -0600 (CST) Received: from mail-out4.apple.com (mail-out4.apple.com [17.254.13.23]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2S0J3ab021429; Tue, 27 Mar 2007 18:19:03 -0600 Received: from relay7.apple.com (relay7.apple.com [17.128.113.37]) by mail-out4.apple.com (8.13.8/8.13.8) with ESMTP id l2S0IvxI023837; Tue, 27 Mar 2007 17:18:57 -0700 (PDT) Received: from relay7.apple.com (unknown [127.0.0.1]) by relay7.apple.com (Symantec Mail Security) with ESMTP id BEC8D30015; Tue, 27 Mar 2007 17:18:57 -0700 (PDT) X-AuditID: 11807125-ae661bb00000538d-90-4609b47155ee Received: from [17.202.20.91] (nekobasu.apple.com [17.202.20.91]) by relay7.apple.com (Apple SCV relay) with ESMTP id 8295A30083; Tue, 27 Mar 2007 17:18:57 -0700 (PDT) In-Reply-To: <000a01c770bb$d8ff3e30$8afdba90$@net> References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> <6.0.0.20.2.20070327184827.077222d0@localhost> <000a01c770bb$d8ff3e30$8afdba90$@net> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <0CC7200B-63AF-447D-8A97-07922B860AB9@apple.com> Cc: CLDR list Content-Transfer-Encoding: 7bit From: Deborah Goldsmith Subject: Re: Concerns about relative dates Date: Tue, 27 Mar 2007 17:18:57 -0700 To: cldr-users@unicode.org X-Mailer: Apple Mail (2.752.3) X-Brightmail-Tracker: AAAAAA== X-archive-position: 49 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: goldsmit@apple.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users On Mar 27, 2007, at 3:04 PM, Don Osborn wrote: > Where exactly is the line drawn? How exactly would having "the day > before > yesterday" or shoe sizes (sorry Tex) defined in locale data assist in > localizing software or loading a webpage? The motivation for this particular feature is to add (optional) support for relative date in date formatting. This is used widely in Mac OS X, and Apple would like to see this supported in ICU, and by extension in CLDR. For example, the dates in my mailbox look like this (a subset): Today 4:58 PM Today 4:03 PM ... Yesterday 2:22 PM Yesterday 1:38 PM ... March 25, 2007 11:53 PM March 22, 2007 3:51 PM Pretty much every application on Mac OS X that gives a list of items with dates uses this approach. Unix uses a similar technique, albeit in a non-localized way: file modification dates are shown one way for dates within the last 12 months, and another way for older files: -rwxrwxrwx 1 12 Oct 28 2005 gui_rpc_auth.cfg -rwxrwxrwx 1 85462 Nov 26 17:52 Reservations - Book Flight - View Itinerary.pdf When to use the absolute date and when to use a relative date is locale-dependent, as is the set of terms, which is why locale- dependent information on this is needed. Since CLDR already contains information on date formatting, it would be awkward if an implementation (such as ICU) had to rely on CLDR for some locale information and a private database for other locale information within the same date formatting API. Deborah From sankarshan.mukhopadhyay@gmail.com Tue Mar 27 21:21:15 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Tue, 27 Mar 2007 21:21:15 -0600 (CST) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2S3LFZ2000518 for ; Tue, 27 Mar 2007 21:21:15 -0600 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.1/8.13.1) with ESMTP id l2S3LEYF017841 for ; Tue, 27 Mar 2007 23:21:14 -0400 Received: from lacrosse.corp.redhat.com (lacrosse.corp.redhat.com [172.16.52.154]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id l2S3LESV011720 for ; Tue, 27 Mar 2007 23:21:14 -0400 Received: from [10.11.14.83] (vpn-14-83.rdu.redhat.com [10.11.14.83]) by lacrosse.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id l2S3LCf1002230 for ; Tue, 27 Mar 2007 23:21:13 -0400 Message-ID: <4609E00F.90401@gmail.com> Date: Wed, 28 Mar 2007 08:55:03 +0530 From: Sankarshan Mukhopadhyay User-Agent: Thunderbird 1.5.0.10 (X11/20070302) MIME-Version: 1.0 To: cldr-users@unicode.org Subject: Re: What is the process for getting in corrections to CLDR for a particular locale ? References: <46090A11.20304@gmail.com> <30b660a20703271131j65f52721w2d4fca008ad71275@mail.gmail.com> In-Reply-To: <30b660a20703271131j65f52721w2d4fca008ad71275@mail.gmail.com> X-Enigmail-Version: 0.94.1.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-archive-position: 50 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: sankarshan.mukhopadhyay@gmail.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Davis wrote: > If you have questions after reading that material, feel free to ask on > this list. Excellent. Thanks Mark :Sankarshan - -- You see things; and you say 'Why?'; But I dream things that never were; and I say 'Why not?' - George Bernard Shaw -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFGCeAOXQZpNTcrCzMRAkzcAKC/2ndlHDMbIhwy4l3tu7XM99KpNgCeJCMk 8NS89r+fOqdWbSRVDGMnCWY= =kGgQ -----END PGP SIGNATURE----- From eik@iki.fi Wed Mar 28 01:01:28 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 28 Mar 2007 01:01:29 -0600 (CST) Received: from smtp6.pp.htv.fi (smtp6.pp.htv.fi [213.243.153.40]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2S71SRs019457; Wed, 28 Mar 2007 01:01:28 -0600 Received: from Raahattava (cs181004115.pp.htv.fi [82.181.4.115]) by smtp6.pp.htv.fi (Postfix) with ESMTP id 3238E5BC04C; Wed, 28 Mar 2007 10:01:23 +0300 (EEST) From: "Erkki I. Kolehmainen" To: Cc: "'Deborah Goldsmith'" , "'CLDR list'" , Subject: VS: Concerns about relative dates Date: Wed, 28 Mar 2007 10:01:19 +0300 Message-ID: <004b01c77106$e2fa5eb0$0200a8c0@Raahattava> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 In-Reply-To: <4609B2F3.8090807@ix.netcom.com> Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l2S71SRs019457 X-archive-position: 51 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: eik@iki.fi Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Asmus, Matching of the various ways to define shoe sizes (as an example) would be of tremendous help for consumers who'd like to order these products from another country in the web. If they could be informed of the corresponding (shoe) size in their local sizing system, they'd be encouraged to place an order, since the need for cumbersome returns could mostly be avoided. Regards, Erkki Erkki I. Kolehmainen Tilkankatu 12 A 3, FI-00300 Helsinki, Finland Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 Asmus Freytag wrote on 2007-03-27: ... Shoe size is in a different and very specific domain from time, quantity and currency, the main three 'generic' domains that i18n has attempted to cover from the beginning. Drawing a bright line there is useful. Martin's examples point out the need to define which of these expressions are used in ordinary *written* communication. While he's correct that in German you can use agglomeration (similar to the great-great-....-grandfather style of genealogical reference) any use of expressions beyond the single prefix terms "vorgestern" (day before yesterday) or "übermorgen" (day after tomorrow) are properly limited to spoken communication of the more informal kind. A./ From verdy_p@wanadoo.fr Wed Mar 28 02:42:35 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 28 Mar 2007 02:42:36 -0600 (CST) Received: from smtp20.orange.fr (smtp20.orange.fr [80.12.242.26]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2S8gZi1001538; Wed, 28 Mar 2007 02:42:35 -0600 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2026.orange.fr (SMTP Server) with ESMTP id E0F4C1C0009D; Wed, 28 Mar 2007 10:42:29 +0200 (CEST) Received: from HARNON (APoitiers-156-1-126-208.w90-5.abo.wanadoo.fr [90.5.141.208]) by mwinf2026.orange.fr (SMTP Server) with ESMTP id 8BA8B1C00087; Wed, 28 Mar 2007 10:42:29 +0200 (CEST) X-ME-UUID: 20070328084229572.8BA8B1C00087@mwinf2026.orange.fr From: "Philippe Verdy" To: Cc: "'Deborah Goldsmith'" , "'CLDR list'" , References: <4609B2F3.8090807@ix.netcom.com> <004b01c77106$e2fa5eb0$0200a8c0@Raahattava> Subject: RE: Concerns about relative dates Date: Wed, 28 Mar 2007 10:42:24 +0200 Organization: Ordinateur Personnel Message-ID: <00af01c77115$0279ff30$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <004b01c77106$e2fa5eb0$0200a8c0@Raahattava> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 thread-index: AcdxB9qgQmUrJLRdQguRaGYQOLKUXwACjcqg Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l2S8gZi1001538 X-archive-position: 52 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Shoes and clothing sizes are perfect examples where defining any standard leads to failure, simply because they are evolving differently across countries. That's why, regularly, there are studies organized by these industries to analyze the metrics of the population, get get better fits to their markets. But other factors are also influencing a lot this market: * Fashion, where people will want to buy smaller sizes or larger ones (more in the clothing industry than for shoes). * The build material of products: because some are quite adaptative, people won't see that their own body size has also evolved over time. A new material may not fit well despite it is used to build products that are exactly the same physical size. * Wearing attitudes: people don't wear these the same way * Physical differences: you can't give a single measure that will arbitrarily fit with everyone; measures are just weighted averages for a single metric which is "representative" of some groups, but the structure of groups is evolving, both over time and space. The scales are just a best-fit representation of a campaign of measures organized in some population at one period of time. I've never tried to buy shoes over the Internet, because I really need to try them, and I think that many people are like me. The size is just an approximation that is never fitted exactly for almost everyone. There's even no agreed sales across vendors and marks, and if such agreed standard exists, it is not defined within the current definitions of locales supported by CLDR. The actual definition is completely orthogonal to languages. So really, each shoe/clothing marker is producing its own scales of sizes, and try to adapt to their expected market, with more or less success, but buyers still need to try them. There are some products however that can't be tried by customers (underwears, socks), and the best that an be done is to produce products that can fit large ranges of sizes, and everyone will buy those products that never fit, but will have to know themselves which size will be OK for them. So the industries have no other choice than making their own studies and publish their own comparative metrics to help their customers. > -----Message d'origine----- > De : cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org] > De la part de Erkki I. Kolehmainen > Envoyé : mercredi 28 mars 2007 09:01 > À : cldr-users@unicode.org > Cc : 'Deborah Goldsmith'; 'CLDR list'; duerst@it.aoyama.ac.jp > Objet : VS: Concerns about relative dates > > Asmus, > > Matching of the various ways to define shoe sizes (as an example) would be > of tremendous help for consumers who'd like to order these products from > another country in the web. If they could be informed of the corresponding > (shoe) size in their local sizing system, they'd be encouraged to place an > order, since the need for cumbersome returns could mostly be avoided. From mark.edward.davis@gmail.com Wed Mar 28 10:15:45 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 28 Mar 2007 10:15:45 -0600 (CST) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.168]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2SGFiSJ025913 for ; Wed, 28 Mar 2007 10:15:45 -0600 Received: by ug-out-1314.google.com with SMTP id o4so272174uge for ; Wed, 28 Mar 2007 09:15:43 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=ZaYqpQA1tFug2kyxbJpIlnVSFWFr3CB0pcKqqznETBmVv3YsaAtMhZqTGSzJH/9AXYbZdRYW5merR/tvC74As5vSj9D1DmoxO0rgQ7QbClG0xdRr5mnIpGcbioFCqJrH6OrZ00bNDdtigekc3NPYcw+L20H/9atk89ywiziK+6k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=QR1YTKhXfX0Xs7gKK2kXmvP23OUPxqYPm0TWvPL+Vh8425EqdovrEgFSBmnWmsjbqUjtlH0tAieHV2pUgfuAnW46mDqUcxiOyIcsH9RNHzWtaVH2T94gIJQGxtCRgwS/lsWfyXafwup6NqZF2yFyMCzmyKzuBb7JbcQL5YWfrPo= Received: by 10.114.169.2 with SMTP id r2mr3801974wae.1175098542097; Wed, 28 Mar 2007 09:15:42 -0700 (PDT) Received: by 10.114.196.2 with HTTP; Wed, 28 Mar 2007 09:15:42 -0700 (PDT) Message-ID: <30b660a20703280915v3fa4be8cm4bbba58c02a84b8a@mail.gmail.com> Date: Wed, 28 Mar 2007 09:15:42 -0700 From: "Mark Davis" To: "CLDR list" , cldr-users@unicode.org Subject: Canonicalizing CLDR display/input MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_39788_11993707.1175098542055" X-Google-Sender-Auth: 06d24b075e7cda01 X-archive-position: 53 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: mark.davis@icu-project.org Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users ------=_Part_39788_11993707.1175098542055 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline I've been thinking a bit about some of the way that our verification is working, and thought I'd pass some thoughts around. Right now, the display value that we show in the Survey Tool is typically the exact value that comes from the XML, and the input value taken from the user's proposal is exactly what is typed in. However, there are circumstances where I think we might want to do a bit of processing on each of those. 1. For the display value, we could improve the display for BIDI if we modified it slightly for complicated cases, like some of the patterns. For example, for the exemplar characters it would be easier to read if the "[" and "]" characters were surrounded by LRMs. 2. For the input value, we already check that the input form is canonicalized: that there are no leading/trailing spaces, that the number formats are in canonical form (eg #,##0.00 and not 0,000,000.000), and that the exemplar characters are in order. It would be more user-friendly if we went ahead, for these very clear-cut cases, to canonicalize the user's input. That would save the user having to re-enter text just to get those things right. Does this seem to make sense? -- Mark ------=_Part_39788_11993707.1175098542055 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline I've been thinking a bit about some of the way that our verification is working, and thought I'd pass some thoughts around. Right now, the display value that we show in the Survey Tool is typically the exact value that comes from the XML, and the input value taken from the user's proposal is exactly what is typed in. However, there are circumstances where I think we might want to do a bit of processing on each of those.

1. For the display value, we could improve the display for BIDI if we modified it slightly for complicated cases, like some of the patterns. For example, for the exemplar characters it would be easier to read if the "[" and "]" characters were surrounded by LRMs.

2. For the input value, we already check that the input form is canonicalized: that there are no leading/trailing spaces, that the number formats are in canonical form (eg #,##0.00 and not 0,000,000.000), and that the exemplar characters are in order. It would be more user-friendly if we went ahead, for these very clear-cut cases, to canonicalize the user's input. That would save the user having to re-enter text just to get those things right.

Does this seem to make sense?

--
Mark ------=_Part_39788_11993707.1175098542055-- From asmusf@ix.netcom.com Wed Mar 28 18:09:06 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Wed, 28 Mar 2007 18:09:07 -0600 (CST) Received: from elasmtp-spurfowl.atl.sa.earthlink.net (elasmtp-spurfowl.atl.sa.earthlink.net [209.86.89.66]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2T096Lo027684; Wed, 28 Mar 2007 18:09:06 -0600 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=JXu5eS4Pu2oxVshcKBwdJWB4ZX3lo3BXGPibYqEBqym0W34FfY4NqzjhkeCkjl3u; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP; Received: from [131.107.204.126] (helo=[127.0.0.1]) by elasmtp-spurfowl.atl.sa.earthlink.net with asmtp (Exim 4.34) id 1HWiCJ-0007Sy-CN; Wed, 28 Mar 2007 20:08:59 -0400 Message-ID: <460B0396.7050104@ix.netcom.com> Date: Wed, 28 Mar 2007 17:08:54 -0700 From: Asmus Freytag User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: cldr-users@unicode.org CC: "'Deborah Goldsmith'" , "'CLDR list'" , duerst@it.aoyama.ac.jp Subject: Re: VS: Concerns about relative dates References: <004b01c77106$e2fa5eb0$0200a8c0@Raahattava> In-Reply-To: <004b01c77106$e2fa5eb0$0200a8c0@Raahattava> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-ELNK-Trace: 464f085de979d7246f36dc87813833b22c120543388a5fd798f1712237e3b0dda39638dcd996ff87350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 131.107.204.126 X-archive-position: 54 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmusf@ix.netcom.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users On 3/28/2007 12:01 AM, Erkki I. Kolehmainen wrote: > Asmus, > > Matching of the various ways to define shoe sizes (as an example) would be > of tremendous help for consumers who'd like to order these products from > another country in the web. If they could be informed of the corresponding > (shoe) size in their local sizing system, they'd be encouraged to place an > order, since the need for cumbersome returns could mostly be avoided. > > I don't disagree - but I wrote: > Shoe size is in a different and very specific domain from time, quantity > and currency, the main three 'generic' domains that i18n has attempted > to cover from the beginning. > Conversions between units, which is what shoe sizes are, is different from formatting quantities. It's more similar to converting between mm and nautical miles. A./ From eik@iki.fi Thu Mar 29 00:06:33 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Thu, 29 Mar 2007 00:06:34 -0600 (CST) Received: from smtp5.pp.htv.fi (smtp5.pp.htv.fi [213.243.153.39]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2T66WBL020408; Thu, 29 Mar 2007 00:06:32 -0600 Received: from Raahattava (cs181004115.pp.htv.fi [82.181.4.115]) by smtp5.pp.htv.fi (Postfix) with ESMTP id 7C92B5BC1EB; Thu, 29 Mar 2007 09:06:31 +0300 (EEST) From: "Erkki I. Kolehmainen" To: Cc: "'Deborah Goldsmith'" , "'CLDR list'" , Subject: VS: VS: Concerns about relative dates Date: Thu, 29 Mar 2007 09:06:28 +0300 Message-ID: <004001c771c8$63b32070$0200a8c0@Raahattava> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Importance: Normal In-Reply-To: <460B0396.7050104@ix.netcom.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l2T66WBL020408 X-archive-position: 55 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: eik@iki.fi Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users The major difference here is that mm and nautical miles are universally known, whereas e.g. the shoe sizes are region dependent. Regards, Erkki Erkki I. Kolehmainen Tilkankatu 12 A 3, FI-00300 Helsinki, Finland Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 -----Alkuperäinen viesti----- Lähettäjä: cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org] Puolesta Asmus Freytag Lähetetty: 29. maaliskuuta 2007 3:09 Vastaanottaja: cldr-users@unicode.org Kopio: 'Deborah Goldsmith'; 'CLDR list'; duerst@it.aoyama.ac.jp Aihe: Re: VS: Concerns about relative dates On 3/28/2007 12:01 AM, Erkki I. Kolehmainen wrote: > Asmus, > > Matching of the various ways to define shoe sizes (as an example) > would be of tremendous help for consumers who'd like to order these > products from another country in the web. If they could be informed of > the corresponding > (shoe) size in their local sizing system, they'd be encouraged to place an > order, since the need for cumbersome returns could mostly be avoided. > > I don't disagree - but I wrote: > Shoe size is in a different and very specific domain from time, > quantity > and currency, the main three 'generic' domains that i18n has attempted > to cover from the beginning. > Conversions between units, which is what shoe sizes are, is different from formatting quantities. It's more similar to converting between mm and nautical miles. A./ From asmusf@ix.netcom.com Fri Mar 30 02:53:19 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 30 Mar 2007 02:53:20 -0600 (CST) Received: from elasmtp-scoter.atl.sa.earthlink.net (elasmtp-scoter.atl.sa.earthlink.net [209.86.89.67]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2U8rJvm022137; Fri, 30 Mar 2007 02:53:19 -0600 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=ng4q/u6xDttqNSeIGHI48TfOCRB1ycafvVOV/4uzzLBnQTq2yw3ODNp/uanUdbjs; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP; Received: from dialup-4.242.21.127.dial1.seattle1.level3.net ([4.242.21.127] helo=[127.0.0.1]) by elasmtp-scoter.atl.sa.earthlink.net with asmtp (Exim 4.34) id 1HXCr9-0007zF-Q6; Fri, 30 Mar 2007 04:53:12 -0400 Message-ID: <460CCFF1.2030801@ix.netcom.com> Date: Fri, 30 Mar 2007 01:53:05 -0700 From: Asmus Freytag User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: cldr-users@unicode.org CC: "'Deborah Goldsmith'" , "'CLDR list'" , duerst@it.aoyama.ac.jp Subject: Re: VS: VS: Concerns about relative dates References: <004001c771c8$63b32070$0200a8c0@Raahattava> In-Reply-To: <004001c771c8$63b32070$0200a8c0@Raahattava> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-ELNK-Trace: 464f085de979d7246f36dc87813833b22c120543388a5fd78a6f076beed9338411b0f93129c199fe350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 4.242.21.127 X-archive-position: 56 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: asmusf@ix.netcom.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users On 3/28/2007 11:06 PM, Erkki I. Kolehmainen wrote: > The major difference here is that mm and nautical miles are universally > known, whereas e.g. the shoe sizes are region dependent. > The entire system of US measures is inherently 'regional', despite the fact that information about them is widely disseminated and some of them, such as "barrel" for oil shipments, remain internationally used. Then there are regional differences in usage, for example, wind-speeds are quoted in the Beaufort scale in Europe, whereas either knots or mph is more common in the US (in both cases, scientific publications might use yet a different scale). The point being that both knots and Beaufort are universal in their definition. In the US, a percentage of footwear imports is not sized in US sizes, by the way. The same may or may not be true for other countries (leave alone the case where items are sized using multiple systems). Furthermore, the increments are different in each system, so that it does make a difference whether something was manufactured to a particular system - encouraging blind translation of such data is potentially a disservice (compared to the widely used dual representation of original size with approximate equivalent in another system). What I'm trying to drive at is that for these things it's not simply the case that there are fixed local conventions that you can use without additional domain expertise. We've implicitly agreed to treat the domain expertise for time, numeric value and currency amounts as universal - doing the same for clothing sizes brings in a whole lot of different challenges, for example, how to you translate S, M, L, LT, XL, XXL into numeric sizing systems, for example. I posit from my own experience that this cannot be done universally. If you want to think about this systematically, you would try to determine whether there is a benefit of having specialized domain knowledge (with local variations) maintained in a universal repository, such as CLDR, where the focus is on local variation, or whether you are better off having this documented in a domain-specific way. If you do get agreement that the former is preferable, you would need not only buy-in, but active support from the players, users and experts in that domain. Perhaps this is a common enough issue that CLDR process should have an explicit method to establish such domain-specific sub-repositories, with proper threshold requirements for their creation and proper credentials for their maintenance. Anyway, that would be the systematic way to go about it. A./ > Regards, Erkki > > Erkki I. Kolehmainen > Tilkankatu 12 A 3, FI-00300 Helsinki, Finland > Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 > > -----Alkuperäinen viesti----- > Lähettäjä: cldr-users-bounce@unicode.org > [mailto:cldr-users-bounce@unicode.org] Puolesta Asmus Freytag > Lähetetty: 29. maaliskuuta 2007 3:09 > Vastaanottaja: cldr-users@unicode.org > Kopio: 'Deborah Goldsmith'; 'CLDR list'; duerst@it.aoyama.ac.jp > Aihe: Re: VS: Concerns about relative dates > > > On 3/28/2007 12:01 AM, Erkki I. Kolehmainen wrote: > >> Asmus, >> >> Matching of the various ways to define shoe sizes (as an example) >> would be of tremendous help for consumers who'd like to order these >> products from another country in the web. If they could be informed of >> the corresponding >> (shoe) size in their local sizing system, they'd be encouraged to place an >> order, since the need for cumbersome returns could mostly be avoided. >> >> >> > I don't disagree - but I wrote: > >> Shoe size is in a different and very specific domain from time, >> quantity >> and currency, the main three 'generic' domains that i18n has attempted >> to cover from the beginning. >> >> > Conversions between units, which is what shoe sizes are, is different > from formatting quantities. It's more similar to converting between mm > and nautical miles. > > A./ > > > > > > > > > From eik@iki.fi Fri Mar 30 03:50:10 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 30 Mar 2007 03:50:10 -0600 (CST) Received: from smtp4.pp.htv.fi (smtp4.pp.htv.fi [213.243.153.38]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2U9o5ir028394; Fri, 30 Mar 2007 03:50:09 -0600 Received: from Raahattava (cs181004115.pp.htv.fi [82.181.4.115]) by smtp4.pp.htv.fi (Postfix) with ESMTP id CDE125BC1E1; Fri, 30 Mar 2007 12:50:04 +0300 (EEST) From: "Erkki I. Kolehmainen" To: Cc: "'Deborah Goldsmith'" , "'CLDR list'" , Subject: VS: VS: VS: Concerns about relative dates Date: Fri, 30 Mar 2007 12:50:01 +0300 Message-ID: <003401c772b0$c8f553c0$0200a8c0@Raahattava> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Importance: Normal In-Reply-To: <460CCFF1.2030801@ix.netcom.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l2U9o5ir028394 X-archive-position: 57 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: eik@iki.fi Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Asmus, I think we agree also here on a lot of things, since e.g. S as a size is not the same in all countries whereas e.g. an inch has nowadays the same length all over the world (whether in active use or not). The issue of different sizing systems is important in e.g. the open European internal market where the consumers have the right to return and get a refund for the purchases from within the area. I'm not advocating a blind translation, but I believe that there is a benefit of having specialized domain knowledge (with local variations) maintained in or attached to a universal repository such as CLDR. Best regards, Erkki Erkki I. Kolehmainen Tilkankatu 12 A 3, FI-00300 Helsinki, Finland Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943 -----Alkuperäinen viesti----- Lähettäjä: cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org] Puolesta Asmus Freytag Lähetetty: 30. maaliskuuta 2007 11:53 Vastaanottaja: cldr-users@unicode.org Kopio: 'Deborah Goldsmith'; 'CLDR list'; duerst@it.aoyama.ac.jp Aihe: Re: VS: VS: Concerns about relative dates On 3/28/2007 11:06 PM, Erkki I. Kolehmainen wrote: > The major difference here is that mm and nautical miles are > universally known, whereas e.g. the shoe sizes are region dependent. > The entire system of US measures is inherently 'regional', despite the fact that information about them is widely disseminated and some of them, such as "barrel" for oil shipments, remain internationally used. Then there are regional differences in usage, for example, wind-speeds are quoted in the Beaufort scale in Europe, whereas either knots or mph is more common in the US (in both cases, scientific publications might use yet a different scale). The point being that both knots and Beaufort are universal in their definition. In the US, a percentage of footwear imports is not sized in US sizes, by the way. The same may or may not be true for other countries (leave alone the case where items are sized using multiple systems). Furthermore, the increments are different in each system, so that it does make a difference whether something was manufactured to a particular system - encouraging blind translation of such data is potentially a disservice (compared to the widely used dual representation of original size with approximate equivalent in another system). What I'm trying to drive at is that for these things it's not simply the case that there are fixed local conventions that you can use without additional domain expertise. We've implicitly agreed to treat the domain expertise for time, numeric value and currency amounts as universal - doing the same for clothing sizes brings in a whole lot of different challenges, for example, how to you translate S, M, L, LT, XL, XXL into numeric sizing systems, for example. I posit from my own experience that this cannot be done universally. If you want to think about this systematically, you would try to determine whether there is a benefit of having specialized domain knowledge (with local variations) maintained in a universal repository, such as CLDR, where the focus is on local variation, or whether you are better off having this documented in a domain-specific way. If you do get agreement that the former is preferable, you would need not only buy-in, but active support from the players, users and experts in that domain. Perhaps this is a common enough issue that CLDR process should have an explicit method to establish such domain-specific sub-repositories, with proper threshold requirements for their creation and proper credentials for their maintenance. Anyway, that would be the systematic way to go about it. A./ > Regards, Erkki > > Erkki I. Kolehmainen > Tilkankatu 12 A 3, FI-00300 Helsinki, Finland > Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 > 943 > > -----Alkuperäinen viesti----- > Lähettäjä: cldr-users-bounce@unicode.org > [mailto:cldr-users-bounce@unicode.org] Puolesta Asmus Freytag > Lähetetty: 29. maaliskuuta 2007 3:09 > Vastaanottaja: cldr-users@unicode.org > Kopio: 'Deborah Goldsmith'; 'CLDR list'; duerst@it.aoyama.ac.jp > Aihe: Re: VS: Concerns about relative dates > > > On 3/28/2007 12:01 AM, Erkki I. Kolehmainen wrote: > >> Asmus, >> >> Matching of the various ways to define shoe sizes (as an example) >> would be of tremendous help for consumers who'd like to order these >> products from another country in the web. If they could be informed of >> the corresponding >> (shoe) size in their local sizing system, they'd be encouraged to place an >> order, since the need for cumbersome returns could mostly be avoided. >> >> >> > I don't disagree - but I wrote: > >> Shoe size is in a different and very specific domain from time, >> quantity >> and currency, the main three 'generic' domains that i18n has attempted >> to cover from the beginning. >> >> > Conversions between units, which is what shoe sizes are, is different > from formatting quantities. It's more similar to converting between mm > and nautical miles. > > A./ > > > > > > > > > From dzo@bisharat.net Fri Mar 30 07:41:13 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 30 Mar 2007 07:41:13 -0600 (CST) Received: from kabissa.org (113166.kabissa.org [72.32.199.201]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2UDf9tL022684 for ; Fri, 30 Mar 2007 07:41:13 -0600 Received: (qmail 16113 invoked from network); 30 Mar 2007 07:44:04 -0500 Received: from h-67-100-157-224.mclnva23.covad.net (HELO IBM92AA25595C4) (67.100.157.224) by 72.32.229.137 with SMTP; 30 Mar 2007 07:44:04 -0500 From: "Don Osborn" To: Cc: "'CLDR list'" References: <692CAFC9-99D8-450A-9595-964BAAD208FB@apple.com> <30b660a20703261421l444624fen3f9d46f1d95309a8@mail.gmail.com> <6.0.0.20.2.20070327184827.077222d0@localhost> <000a01c770bb$d8ff3e30$8afdba90$@net> <4609AFA2.5030105@yahoo-inc.com> In-Reply-To: <4609AFA2.5030105@yahoo-inc.com> Subject: RE: Concerns about relative dates Date: Fri, 30 Mar 2007 08:44:03 -0400 Message-ID: <016501c772c9$18e8ee60$4abacb20$@net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acdwy+La8bnTqk9iRWCgeqTPRNnPrgB92ffw Content-Language: en-us Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l2UDf9tL022684 X-archive-position: 58 X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: dzo@bisharat.net Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users Thanks Addison, this helps me to understand some of the issues. I somehow thought that CLDR was tied more closely to localization of software and software's ability to interpret basic data appropriately for a locale. Problem for me is that I don't have programming experience to see how these are used in practice. Nevertheless, when the number of terms begins to make CLDR look like "common localization dictionary repository" (nice way of putting it), then it begins to look to a non-programmer like there might be two different sets of needs combined under one project. Has any thought been given to separating out basic locale data on one hand from more extensive localizers' dictionaries on the other? I'd be tempted to suggest a "Wiktionary" approach to the latter that permits localizers to constantly add and improve on definitions without debates over the appropriateness of categories, while at the same time being a "fundamentalist" on the former (that is, the locale data maintained in CLDR, with limited periods for input etc.). One reason that I am thinking about this is that I'm looking at the need to put together locale data for basic localization in many languages, and the fact that adding a lot of specialized references to be translated adds to the burden without facilitating the main immediate goals for these languages (localized software, accommodation of localized web content). Even something like names for all the countries and many world languages poses difficulties (there was a remark not long ago on another list about how to translate Bosnia Herzegovina into Swahili). You end up either forcing translations that likely will have to be revised (again in the case of Swahili there was an issue about different renditions of Côte d'Ivoire), or leaving them blank (putting off completing a locale in CLDR until the next round of revisions, or the one after that). Having looked at the locale data fields for OpenOffice, it seems these are less numerous, since the purpose is very specifically for that software. So again I wonder in the case of CLDR about focusing on basic locale data while providing another mechanism (perhaps a Wikitionary approach) for what one might term "extended" locale data (shoe sizes, nautical terms, day-before-the-day-before-yesterday, etc.). Don > -----Original Message----- > From: cldr-users-bounce@unicode.org [mailto:cldr-users- > bounce@unicode.org] On Behalf Of Addison Phillips > Sent: Tuesday, March 27, 2007 7:58 PM > To: cldr-users@unicode.org > Cc: 'Deborah Goldsmith'; 'CLDR list'; duerst@it.aoyama.ac.jp > Subject: Re: Concerns about relative dates > > > I still wonder about the perceived need to add more terms to > > locale data. > > I must admit to being at least a bit dubious about putting this data > into CLDR, but at the same time it is a very common problem. Plenty of > UI designers want to have friendly time strings ("## minutes ago", > etc.). The number of design problems similar to this one is open-ended > and certainly this verges on the "common localization dictionary > repository" :-). Still... > > In programming terms, the typical solution to this type of problem is > something similar to a Java ChoiceFormat, in which you have an array of > limit values with associated strings. > > What's nice, from the programmer's point of view, is that the number of > resource values doesn't have to be determined at design time. It's very > bad if you provide a data structure with room for three things and the > next language you encounter turns out to need four... > > The main problem with "choice format" is that linguists really haven't > a > clue what to do with these things. They're rarely used and translation > tools aren't usually set up to deal with them---usually in translation > you end up with exactly the same number of resources, while in "choice > format" the idea is that you can have more (or fewer)... and that you > might need to modify the values used to pick the string (and not just > translate the string). > > This kind of interaction design related to time crops up so frequently, > that one wants to create and (re)use locale data for each of the > potential periods (seconds, minutes, hours, days, weeks, months, years, > fortnights, moons, sols, etc.). Literally the last email I sent was a > pointer to my JavaScript implementation of choice format for > timestamps, > so I am certainly sensitized to the need for it :-). > > If these resources are generally available, then there is much less > need > for the choice format design pattern (you use date formatting instead). > And less incentive for people like myself to encourage UI designers to > make unfriendly strings like "hours ago: 2". > > So I guess I'm saying I support this particular case, but also > recognize > that it is near the "gray zone". [If someone proposes page counts, > weights, lengths, byte counts, hat sizes, etc., I'll probably go "ick"] > > > My original conception of the purpose of the locale data was as kind > of a > > "linguistic boot file," so the software will know how to order > dates, > what > > character repertoire is necessary, and a limited number of other > basic > > parameters. > > One use for locales is to transform "objects" (things such as a time or > a number) between their computer representation and a human-oriented > representation. We are used to expecting that a "locale" results in > some > date strings including a sequence of characters like "January" or > "Dezember". This, necessarily, is indistinguishable from generating > some > human language from a digital representation (e.g. number of seconds > since January 1, 1980, midnight, UTC). > > As such, the problem isn't "where to start", but where to draw the line > between what is baked into our operating environment (OS, programming > languages, etc.) and what we build for ourselves in our applications > (programs, web sites, documents, etc.). There are many things, of > course, that have computer representations. The question is how common > the need for a particular general purpose representation is and when > one > person or group's design decisions begin to affect the usefulness of > the > whole structure (by negatively impacting the separate design decisions > of some other person or group of users). I doubt there is or can be a > bright line defining that decision. > > FWIW, > > Addison > > -- > Addison Phillips > Globalization Architect -- Yahoo! Inc. > > Internationalization is an architecture. > It is not a feature. > > > Don Osborn wrote: > > I assume Deborah's question about relative date references is about > proposed > > fields for locale data, and if so Martin's implied question in > response is > > one that deserves explicit attention. Remembering a discussion a few > months > > ago about adding fields for certain kinds of terms (it might have > been > > gender designations such as one might have in a questionnaire - maybe > > someone remembers), and a response that locales should not become > > dictionaries, I still wonder about the perceived need to add more > terms to > > locale data. And indeed about the dynamic at play: the number of > fields is > > more likely to increase than to stay the same unless the purpose is > strictly > > delimited. > > > > Where exactly is the line drawn? How exactly would having "the day > before > > yesterday" or shoe sizes (sorry Tex) defined in locale data assist in > > localizing software or loading a webpage? At what point should we > expect > > folks needing to know how various languages refer to a particular > thing or > > concept to look it up in something other than locale data? > (Multilingual > > dictionaries?) > > > > My original conception of the purpose of the locale data was as kind > of a > > "linguistic boot file," so the software will know how to order dates, > what > > character repertoire is necessary, and a limited number of other > basic > > parameters. "Common, necessary software locale data for all world > languages" > > per the CLDR Overview via http://www.unicode.org/cldr/ . Are other > purposes > > now foreseen? > > > > Sorry if this is just clueless and thanks in advance for any > > enlightenment... > > > > Don > > > > > >> -----Original Message----- > >> From: cldr-users-bounce@unicode.org [mailto:cldr-users- > >> bounce@unicode.org] On Behalf Of Martin Duerst > > ... > >> I'm not totally sure what the goal of the collection is for > >> these items, but if somebody wanted to make sure that in some > >> text, it said "the day before yesterday" instead of "two > >> days ago", then we would need to collect these expressions > >> even if they are not single words. > >> > >> Regards, Martin. > >> > >> > >> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University > >> #-#-# http://www.sw.it.aoyama.ac.jp > >> mailto:duerst@it.aoyama.ac.jp > >> > > > > > > > > > From Sau-boon.Lim@sybase.com Fri Mar 30 03:19:55 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 30 Mar 2007 08:24:09 -0600 (CST) Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by unicode.org (8.13.4/8.12.11) with ESMTP id l2U9JsIp007720 for ; Fri, 30 Mar 2007 03:19:54 -0600 Received: from smtp2.sybase.com (sybgate2 [10.22.97.85]) by inergen.sybase.com with ESMTP id l2U9Jm128929 for ; Fri, 30 Mar 2007 01:19:48 -0800 (PST) Received: from gwwest.sybase.com (localhost [127.0.0.1]) by smtp2.sybase.com with ESMTP id l2U9JmU21488 for ; Fri, 30 Mar 2007 01:19:48 -0800 (PST) To: cldr-users@unicode.org Subject: CLDR2ICU MIME-Version: 1.0 Sensitivity: X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: From: Sau-boon.Lim@sybase.com Date: Fri, 30 Mar 2007 17:19:18 +0800 X-MIMETrack: Serialize by Router on gwwest/SYBASE(Release 6.5.5|November 30, 2005) at 03/30/2007 02:19:48 AM, Serialize complete at 03/30/2007 02:19:48 AM Content-Type: multipart/alternative; boundary="=_alternative 0033322C482572AE_=" X-archive-position: 59 X-Approved-By: v-magdad@microsoft.com X-ecartis-version: Ecartis v1.0.0 Sender: cldr-users-bounce@unicode.org Errors-to: cldr-users-bounce@unicode.org X-original-sender: Sau-boon.Lim@sybase.com Precedence: bulk Reply-to: cldr-users@unicode.org X-list: cldr-users This is a multipart message in MIME format. --=_alternative 0033322C482572AE_= Content-Type: text/plain; charset="US-ASCII" Hi, I was told that there's this CLDR2ICU utility which reads the CLDR collation tailoring definition and converts it to some ICU native form. Is this utility and source available somewhere? Regards, Sau-Boon --=_alternative 0033322C482572AE_= Content-Type: text/html; charset="US-ASCII"
Hi,

I was told that there's this CLDR2ICU utility which reads the CLDR collation tailoring definition and converts it to some ICU native form. Is this utility and source available somewhere?

Regards,
Sau-Boon --=_alternative 0033322C482572AE_=-- From mark.edward.davis@gmail.com Fri Mar 30 08:30:22 2007 Received: with ECARTIS (v1.0.0; list cldr-users); Fri, 30 Mar 2007 08:30:22 -0600 (CST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.225]) by unicode.org (8.