Re: [cldr-dev] Re: Questions on Chinese collation, stroke

From: Matt Ma <matt.ma.umail_at_gmail.com>
Date: Fri, 22 Jun 2012 14:05:34 -0700

Entered ticket #4949 for Simplified Chinese, stroke order.

Thanks,
Matt

On Fri, Jun 22, 2012 at 12:55 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
> There are no current plans to do that. If you want to present a case for
> adding additional collation sequences to CLDR, please start the process by
> filing a bug at http://unicode.org/cldr/trac/newticket
>
> ________________________________
> Mark
>
> — Il meglio è l’inimico del bene —
>
>
>
> On Fri, Jun 22, 2012 at 11:05 AM, Matt Ma <matt.ma.umail_at_gmail.com> wrote:
>>
>> Thanks all for clarification. Are there any plans to provider the
>> following collations in CLDR?
>>
>>  1. Simplified Chinese, stroke order, based on 现代汉语通用字笔顺规范 (PRC-China
>> modern Chinese commonly used characters standard stroke orders,
>> mentioned in http://en.wikipedia.org/wiki/Stroke_order).
>>
>>  2. Simplified Chinese, radical order
>>
>>  3. Traditional Chinese, radical order
>>
>> Thanks,
>> Matt
>>
>> On Sat, Jun 9, 2012 at 1:02 AM, Katsuhiko Momoi <katmomoi_at_gmail.com>
>> wrote:
>> > Unihan-6.2.0d1/Unihan_DictionaryLikeData.txt is lacking the Traditional
>> > Chinese stroke count. Currently it only lists:
>> >
>> > U+8303 kTotalStrokes 8
>> >
>> > I filed a ticket for a review:
>> >
>> > http://unicode.org/cldr/trac/ticket/4898
>> >
>> > (I understand that we are supposed to list the Traditional stroke count
>> > after the Simplified one delimited by a {sp}.
>> >
>> > As a general observation, I glanced through a number of kTotalStrokes
>> > entries for strokes 8 and 9. I did not find a single entry that listed 2
>> > stroke counts. This seems odd as there should be other stroke count
>> > differences between Simplified and Traditional Chinese. I suspect that
>> > this
>> > is an area needing more than one correction -- it would be better to do
>> > a
>> > systematic review.
>> >
>> > - Kat
>> >
>> > On Fri, Jun 8, 2012 at 3:44 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>> >>
>> >> It can supply the data for both, if they differ. That's done with two
>> >> fields.
>> >>
>> >> However, in this case there is only one value; if that's incorrect for
>> >> this character someone should file feedback.
>> >>
>> >> ________________________________
>> >> Mark
>> >>
>> >> — Il meglio è l’inimico del bene —
>> >>
>> >>
>> >>
>> >> On Fri, Jun 8, 2012 at 2:41 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
>> >> wrote:
>> >>>
>> >>> Check the tr38, from the description of kTotalStrokes, it provides
>> >>> stroke
>> >>> count data for simplified Chinese and traditional Chinese.
>> >>> Then, I don't have concern.
>> >>>
>> >>> Thanks!
>> >>> Claire.
>> >>>
>> >>>
>> >>> On Fri, Jun 8, 2012 at 2:33 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi Mark
>> >>>>
>> >>>> > There you find the line:
>> >>>>
>> >>>> > U+8303 kTotalStrokes 8
>> >>>>
>> >>>> In Traditional Chinese, U+8303 has 9 strokes as Matt mentioned in the
>> >>>> email.
>> >>>>
>> >>>> The radical "++" is counted as 4 strokes. I think there are several
>> >>>> radicals have the same issue, different stroke counts, between
>> >>>> simplified
>> >>>> Chinese and traditional Chinese.
>> >>>>
>> >>>> Claire.
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ <mark_at_macchiato.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <matt.ma.umail_at_gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I have two questions regarding the collation sequence defined in
>> >>>>>> zh.xml, CLDR 21.0
>> >>>>>>
>> >>>>>> 1. Why is U+8303 (范)  counted as 9 strokes instead of 8 for
>> >>>>>> <collation
>> >>>>>> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes
>> >>>>>> but
>> >>>>>> sorted before U+8303 (范).
>> >>>>>
>> >>>>>
>> >>>>> CLDR now gets the stroke collation data from the kTotalStokes
>> >>>>> property.
>> >>>>> The values for that are in the
>> >>>>> file Unihan/Unihan_DictionaryLikeData.txt in
>> >>>>> the Unicode Character Database.
>> >>>>>
>> >>>>> There you find the line:
>> >>>>>
>> >>>>> U+8303 kTotalStrokes 8
>> >>>>>
>> >>>>> If that is in error, or if there is any other error in
>> >>>>> the kTotalStrokes data, then please report the correct value
>> >>>>> according to
>> >>>>> http://www.unicode.org/review/pri230/ so that it can be fixed.
>> >>>>>
>> >>>>> As a related matter, CLDR now gets the pinyin collation data from
>> >>>>> the kMandarin property. The values for that are in the
>> >>>>> file Unihan/Unihan_Readings.txt in the Unicode Character Database.
>> >>>>> So if any
>> >>>>> of those are in error, they should also be reported as
>> >>>>> per http://www.unicode.org/review/pri230/%c2.
>> >>>>>
>> >>>>> The beta data is
>> >>>>> in ftp://www.unicode.org/Public/6.2.0/ucd/.%c2%a0Currently
>> >>>>> in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
>> >>>>> but as the beta proceeds, the d1 might change to d2,d3...
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> 2. Does the collation type, stroke, apply to both Simplified and
>> >>>>>> Traditional Chinese, as I do not see anything defined in
>> >>>>>> zh_Hant.xml
>> >>>>>> under "stroke"?
>> >>>>>
>> >>>>>
>> >>>>> Let me look at that.
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Matt
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Katsuhiko Momoi <katmomoi_at_gmail.com>
>> >
>> >
>
>
Received on Fri Jun 22 2012 - 16:10:16 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 22 2012 - 16:10:17 CDT