Re: Questions on Chinese collation, stroke

From: Mark Davis ☕ <>
Date: Thu, 7 Jun 2012 17:54:01 -0700

On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <> wrote:

> Hi,
> I have two questions regarding the collation sequence defined in
> zh.xml, CLDR 21.0
> 1. Why is U+8303 (范) counted as 9 strokes instead of 8 for <collation
> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes but
> sorted before U+8303 (范).

CLDR now gets the stroke collation data from the kTotalStokes property. The
values for that are in the file Unihan/Unihan_DictionaryLikeData.txt in the
Unicode Character Database.

There you find the line:

U+8303 kTotalStrokes 8

If that is in error, or if there is any other error in the kTotalStrokes
data, then please report the correct value according to so that it can be fixed.
As a related matter, CLDR now gets the pinyin collation data from
the kMandarin property. The values for that are in the
file Unihan/Unihan_Readings.txt in the Unicode Character Database. So if
any of those are in error, they should also be reported as per .

The beta data is in Currently in
but as the beta proceeds, the d1 might change to d2,d3...

> 2. Does the collation type, stroke, apply to both Simplified and
> Traditional Chinese, as I do not see anything defined in zh_Hant.xml
> under "stroke"?

Let me look at that.

> Thanks,
> Matt
Received on Thu Jun 07 2012 - 19:58:15 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 07 2012 - 19:58:16 CDT