Re: [cldr-dev] Re: Questions on Chinese collation, stroke

From: Stephan Stiller <sstiller_at_stanford.edu>
Date: Fri, 22 Jun 2012 22:43:23 -0400

Dear Matt,

I think those tasks would take a quite a bit of work, because (1) the
three orders you are mentioning are all mathematically underspecified
and (2) they're partial orders even when considering only what you'd
normally consider the respective target domains (certain subsets of CJKV).

I'm sure many or most people reading this know this, but the question is
which committee would get rid of the underspecification (also, according
to what principles?), fine-tune the respective target domains, and such.
(Perhaps the IICore people have done parts of the footwork already?)

Stephan

On 6/22/2012 5:05 PM, Matt Ma wrote:
> Entered ticket #4949 for Simplified Chinese, stroke order.
>
> Thanks,
> Matt
>
> On Fri, Jun 22, 2012 at 12:55 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>> There are no current plans to do that. If you want to present a case for
>> adding additional collation sequences to CLDR, please start the process by
>> filing a bug at http://unicode.org/cldr/trac/newticket
>>
>> ________________________________
>> Mark
>>
>> — Il meglio è l’inimico del bene —
>>
>>
>>
>> On Fri, Jun 22, 2012 at 11:05 AM, Matt Ma <matt.ma.umail_at_gmail.com> wrote:
>>> Thanks all for clarification. Are there any plans to provider the
>>> following collations in CLDR?
>>>
>>> 1. Simplified Chinese, stroke order, based on 现代汉语通用字笔顺规范 (PRC-China
>>> modern Chinese commonly used characters standard stroke orders,
>>> mentioned in http://en.wikipedia.org/wiki/Stroke_order).
>>>
>>> 2. Simplified Chinese, radical order
>>>
>>> 3. Traditional Chinese, radical order
>>>
>>> Thanks,
>>> Matt
>>>
>>> On Sat, Jun 9, 2012 at 1:02 AM, Katsuhiko Momoi <katmomoi_at_gmail.com>
>>> wrote:
>>>> Unihan-6.2.0d1/Unihan_DictionaryLikeData.txt is lacking the Traditional
>>>> Chinese stroke count. Currently it only lists:
>>>>
>>>> U+8303 kTotalStrokes 8
>>>>
>>>> I filed a ticket for a review:
>>>>
>>>> http://unicode.org/cldr/trac/ticket/4898
>>>>
>>>> (I understand that we are supposed to list the Traditional stroke count
>>>> after the Simplified one delimited by a {sp}.
>>>>
>>>> As a general observation, I glanced through a number of kTotalStrokes
>>>> entries for strokes 8 and 9. I did not find a single entry that listed 2
>>>> stroke counts. This seems odd as there should be other stroke count
>>>> differences between Simplified and Traditional Chinese. I suspect that
>>>> this
>>>> is an area needing more than one correction -- it would be better to do
>>>> a
>>>> systematic review.
>>>>
>>>> - Kat
>>>>
>>>> On Fri, Jun 8, 2012 at 3:44 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>>>>> It can supply the data for both, if they differ. That's done with two
>>>>> fields.
>>>>>
>>>>> However, in this case there is only one value; if that's incorrect for
>>>>> this character someone should file feedback.
>>>>>
>>>>> ________________________________
>>>>> Mark
>>>>>
>>>>> — Il meglio è l’inimico del bene —
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 8, 2012 at 2:41 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
>>>>> wrote:
>>>>>> Check the tr38, from the description of kTotalStrokes, it provides
>>>>>> stroke
>>>>>> count data for simplified Chinese and traditional Chinese.
>>>>>> Then, I don't have concern.
>>>>>>
>>>>>> Thanks!
>>>>>> Claire.
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 8, 2012 at 2:33 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
>>>>>> wrote:
>>>>>>> Hi Mark
>>>>>>>
>>>>>>>> There you find the line:
>>>>>>>> U+8303 kTotalStrokes 8
>>>>>>> In Traditional Chinese, U+8303 has 9 strokes as Matt mentioned in the
>>>>>>> email.
>>>>>>>
>>>>>>> The radical "++" is counted as 4 strokes. I think there are several
>>>>>>> radicals have the same issue, different stroke counts, between
>>>>>>> simplified
>>>>>>> Chinese and traditional Chinese.
>>>>>>>
>>>>>>> Claire.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ <mark_at_macchiato.com>
>>>>>>> wrote:
>>>>>>>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <matt.ma.umail_at_gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have two questions regarding the collation sequence defined in
>>>>>>>>> zh.xml, CLDR 21.0
>>>>>>>>>
>>>>>>>>> 1. Why is U+8303 (范) counted as 9 strokes instead of 8 for
>>>>>>>>> <collation
>>>>>>>>> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes
>>>>>>>>> but
>>>>>>>>> sorted before U+8303 (范).
>>>>>>>>
>>>>>>>> CLDR now gets the stroke collation data from the kTotalStokes
>>>>>>>> property.
>>>>>>>> The values for that are in the
>>>>>>>> file Unihan/Unihan_DictionaryLikeData.txt in
>>>>>>>> the Unicode Character Database.
>>>>>>>>
>>>>>>>> There you find the line:
>>>>>>>>
>>>>>>>> U+8303 kTotalStrokes 8
>>>>>>>>
>>>>>>>> If that is in error, or if there is any other error in
>>>>>>>> the kTotalStrokes data, then please report the correct value
>>>>>>>> according to
>>>>>>>> http://www.unicode.org/review/pri230/ so that it can be fixed.
>>>>>>>>
>>>>>>>> As a related matter, CLDR now gets the pinyin collation data from
>>>>>>>> the kMandarin property. The values for that are in the
>>>>>>>> file Unihan/Unihan_Readings.txt in the Unicode Character Database.
>>>>>>>> So if any
>>>>>>>> of those are in error, they should also be reported as
>>>>>>>> per http://www.unicode.org/review/pri230/ .
>>>>>>>>
>>>>>>>> The beta data is
>>>>>>>> in ftp://www.unicode.org/Public/6.2.0/ucd/. Currently
>>>>>>>> in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
>>>>>>>> but as the beta proceeds, the d1 might change to d2,d3...
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2. Does the collation type, stroke, apply to both Simplified and
>>>>>>>>> Traditional Chinese, as I do not see anything defined in
>>>>>>>>> zh_Hant.xml
>>>>>>>>> under "stroke"?
>>>>>>>>
>>>>>>>> Let me look at that.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Matt
>>>>>>>>>
>>>>>>>>>
>>>>
>>>>
>>>> --
>>>> Katsuhiko Momoi <katmomoi_at_gmail.com>
>>>>
>>>>
>>
>
Received on Fri Jun 22 2012 - 21:50:36 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 22 2012 - 21:50:38 CDT