Re: [cldr-dev] Re: Questions on Chinese collation, stroke

From: Matt Ma <matt.ma.umail_at_gmail.com>
Date: Fri, 22 Jun 2012 11:05:37 -0700

Thanks all for clarification. Are there any plans to provider the
following collations in CLDR?

 1. Simplified Chinese, stroke order, based on 现代汉语通用字笔顺规范 (PRC-China
modern Chinese commonly used characters standard stroke orders,
mentioned in http://en.wikipedia.org/wiki/Stroke_order).

 2. Simplified Chinese, radical order

 3. Traditional Chinese, radical order

Thanks,
Matt

On Sat, Jun 9, 2012 at 1:02 AM, Katsuhiko Momoi <katmomoi_at_gmail.com> wrote:
> Unihan-6.2.0d1/Unihan_DictionaryLikeData.txt is lacking the Traditional
> Chinese stroke count. Currently it only lists:
>
> U+8303 kTotalStrokes 8
>
> I filed a ticket for a review:
>
> http://unicode.org/cldr/trac/ticket/4898
>
> (I understand that we are supposed to list the Traditional stroke count
> after the Simplified one delimited by a {sp}.
>
> As a general observation, I glanced through a number of kTotalStrokes
> entries for strokes 8 and 9. I did not find a single entry that listed 2
> stroke counts. This seems odd as there should be other stroke count
> differences between Simplified and Traditional Chinese. I suspect that this
> is an area needing more than one correction -- it would be better to do a
> systematic review.
>
> - Kat
>
> On Fri, Jun 8, 2012 at 3:44 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>>
>> It can supply the data for both, if they differ. That's done with two
>> fields.
>>
>> However, in this case there is only one value; if that's incorrect for
>> this character someone should file feedback.
>>
>> ________________________________
>> Mark
>>
>> — Il meglio è l’inimico del bene —
>>
>>
>>
>> On Fri, Jun 8, 2012 at 2:41 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
>> wrote:
>>>
>>> Check the tr38, from the description of kTotalStrokes, it provides stroke
>>> count data for simplified Chinese and traditional Chinese.
>>> Then, I don't have concern.
>>>
>>> Thanks!
>>> Claire.
>>>
>>>
>>> On Fri, Jun 8, 2012 at 2:33 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
>>> wrote:
>>>>
>>>> Hi Mark
>>>>
>>>> > There you find the line:
>>>>
>>>> > U+8303 kTotalStrokes 8
>>>>
>>>> In Traditional Chinese, U+8303 has 9 strokes as Matt mentioned in the
>>>> email.
>>>>
>>>> The radical "++" is counted as 4 strokes. I think there are several
>>>> radicals have the same issue, different stroke counts, between simplified
>>>> Chinese and traditional Chinese.
>>>>
>>>> Claire.
>>>>
>>>>
>>>> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>>>>>
>>>>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <matt.ma.umail_at_gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have two questions regarding the collation sequence defined in
>>>>>> zh.xml, CLDR 21.0
>>>>>>
>>>>>> 1. Why is U+8303 (范)  counted as 9 strokes instead of 8 for <collation
>>>>>> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes but
>>>>>> sorted before U+8303 (范).
>>>>>
>>>>>
>>>>> CLDR now gets the stroke collation data from the kTotalStokes property.
>>>>> The values for that are in the file Unihan/Unihan_DictionaryLikeData.txt in
>>>>> the Unicode Character Database.
>>>>>
>>>>> There you find the line:
>>>>>
>>>>> U+8303 kTotalStrokes 8
>>>>>
>>>>> If that is in error, or if there is any other error in
>>>>> the kTotalStrokes data, then please report the correct value according to
>>>>> http://www.unicode.org/review/pri230/ so that it can be fixed.
>>>>>
>>>>> As a related matter, CLDR now gets the pinyin collation data from
>>>>> the kMandarin property. The values for that are in the
>>>>> file Unihan/Unihan_Readings.txt in the Unicode Character Database. So if any
>>>>> of those are in error, they should also be reported as
>>>>> per http://www.unicode.org/review/pri230/%c2.
>>>>>
>>>>> The beta data is in ftp://www.unicode.org/Public/6.2.0/ucd/.%c2%a0Currently
>>>>> in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
>>>>> but as the beta proceeds, the d1 might change to d2,d3...
>>>>>
>>>>>>
>>>>>>
>>>>>> 2. Does the collation type, stroke, apply to both Simplified and
>>>>>> Traditional Chinese, as I do not see anything defined in zh_Hant.xml
>>>>>> under "stroke"?
>>>>>
>>>>>
>>>>> Let me look at that.
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Katsuhiko Momoi <katmomoi_at_gmail.com>
>
>
Received on Fri Jun 22 2012 - 13:12:52 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 22 2012 - 13:12:54 CDT