Re: [cldr-dev] Re: Questions on Chinese collation, stroke from Mark Davis ☕ on 2012-06-22 (Unicode Mail List Archive)

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Fri, 22 Jun 2012 12:55:27 -0700

There are no current plans to do that. If you want to present a case for
adding additional collation sequences to CLDR, please start the process by
filing a bug at http://unicode.org/cldr/trac/newticket

------------------------------
Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Fri, Jun 22, 2012 at 11:05 AM, Matt Ma <matt.ma.umail_at_gmail.com> wrote:

> Thanks all for clarification. Are there any plans to provider the
> following collations in CLDR?
>
> 1. Simplified Chinese, stroke order, based on 现代汉语通用字笔顺规范 (PRC-China
> modern Chinese commonly used characters standard stroke orders,
> mentioned in http://en.wikipedia.org/wiki/Stroke_order).
>
> 2. Simplified Chinese, radical order
>
> 3. Traditional Chinese, radical order
>
> Thanks,
> Matt
>
> On Sat, Jun 9, 2012 at 1:02 AM, Katsuhiko Momoi <katmomoi_at_gmail.com>
> wrote:
> > Unihan-6.2.0d1/Unihan_DictionaryLikeData.txt is lacking the Traditional
> > Chinese stroke count. Currently it only lists:
> >
> > U+8303 kTotalStrokes 8
> >
> > I filed a ticket for a review:
> >
> > http://unicode.org/cldr/trac/ticket/4898
> >
> > (I understand that we are supposed to list the Traditional stroke count
> > after the Simplified one delimited by a {sp}.
> >
> > As a general observation, I glanced through a number of kTotalStrokes
> > entries for strokes 8 and 9. I did not find a single entry that listed 2
> > stroke counts. This seems odd as there should be other stroke count
> > differences between Simplified and Traditional Chinese. I suspect that
> this
> > is an area needing more than one correction -- it would be better to do a
> > systematic review.
> >
> > - Kat
> >
> > On Fri, Jun 8, 2012 at 3:44 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
> >>
> >> It can supply the data for both, if they differ. That's done with two
> >> fields.
> >>
> >> However, in this case there is only one value; if that's incorrect for
> >> this character someone should file feedback.
> >>
> >> ________________________________
> >> Mark
> >>
> >> — Il meglio è l’inimico del bene —
> >>
> >>
> >>
> >> On Fri, Jun 8, 2012 at 2:41 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
> >> wrote:
> >>>
> >>> Check the tr38, from the description of kTotalStrokes, it provides
> stroke
> >>> count data for simplified Chinese and traditional Chinese.
> >>> Then, I don't have concern.
> >>>
> >>> Thanks!
> >>> Claire.
> >>>
> >>>
> >>> On Fri, Jun 8, 2012 at 2:33 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>
> >>> wrote:
> >>>>
> >>>> Hi Mark
> >>>>
> >>>> > There you find the line:
> >>>>
> >>>> > U+8303 kTotalStrokes 8
> >>>>
> >>>> In Traditional Chinese, U+8303 has 9 strokes as Matt mentioned in the
> >>>> email.
> >>>>
> >>>> The radical "++" is counted as 4 strokes. I think there are several
> >>>> radicals have the same issue, different stroke counts, between
> simplified
> >>>> Chinese and traditional Chinese.
> >>>>
> >>>> Claire.
> >>>>
> >>>>
> >>>> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ <mark_at_macchiato.com>
> wrote:
> >>>>>
> >>>>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <matt.ma.umail_at_gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have two questions regarding the collation sequence defined in
> >>>>>> zh.xml, CLDR 21.0
> >>>>>>
> >>>>>> 1. Why is U+8303 (范) counted as 9 strokes instead of 8 for
> <collation
> >>>>>> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes
> but
> >>>>>> sorted before U+8303 (范).
> >>>>>
> >>>>>
> >>>>> CLDR now gets the stroke collation data from the kTotalStokes
> property.
> >>>>> The values for that are in the
> file Unihan/Unihan_DictionaryLikeData.txt in
> >>>>> the Unicode Character Database.
> >>>>>
> >>>>> There you find the line:
> >>>>>
> >>>>> U+8303 kTotalStrokes 8
> >>>>>
> >>>>> If that is in error, or if there is any other error in
> >>>>> the kTotalStrokes data, then please report the correct value
> according to
> >>>>> http://www.unicode.org/review/pri230/ so that it can be fixed.
> >>>>>
> >>>>> As a related matter, CLDR now gets the pinyin collation data from
> >>>>> the kMandarin property. The values for that are in the
> >>>>> file Unihan/Unihan_Readings.txt in the Unicode Character Database.
> So if any
> >>>>> of those are in error, they should also be reported as
> >>>>> per http://www.unicode.org/review/pri230/ .
> >>>>>
> >>>>> The beta data is in ftp://www.unicode.org/Public/6.2.0/ucd/
> . Currently
> >>>>> in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
> >>>>> but as the beta proceeds, the d1 might change to d2,d3...
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2. Does the collation type, stroke, apply to both Simplified and
> >>>>>> Traditional Chinese, as I do not see anything defined in zh_Hant.xml
> >>>>>> under "stroke"?
> >>>>>
> >>>>>
> >>>>> Let me look at that.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Matt
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Katsuhiko Momoi <katmomoi_at_gmail.com>
> >
> >
>
Received on Fri Jun 22 2012 - 14:59:21 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 22 2012 - 14:59:22 CDT