L2/06-309 September 24, 2006 Please update my previous report on numeric value for U+F9B2 (via the web form) with the following more extensive report. Or make the following into a separate UTC document. /kent k =============================================== The following two characters are not given numeric values in DerivedNumericValues.txt: 6F06; ?; 7 CJK UNIFIED IDEOGRAPH-6F06 jp-obsolete-financial (according to http://en.wikipedia.org/wiki/Japanese_numerals#Formal_numbers) 9621; ?; 1000 CJK UNIFIED IDEOGRAPH-9621 jp-obsolete-financial (according to http://en.wikipedia.org/wiki/Japanese_numerals#Formal_numbers) Wikipedia gives numeric values (as above) to these two CJK characters. I have no independent confirmation though. Many other CJK characters already given numeric values are obsolete in the same way, so that in itself should be no hindrance. Wikipedia also gives very large (over one trillion) numeric values to certain CJK characters. At least some of these may be interesting to give numeric values to in DerivedNumericValues.txt, especially since some of them are also used as translations for SI prefixes. See http://en.wikipedia.org/wiki/Chinese_numerals#SI_prefixes. (Aside: I don't know where the CJK numeric values in DerivenNumericValues.txt are *derived* from.) ------------------------------------------ The following eight characters are each canonically equivalent with a CJK character that is given a numeric value in DerivedNumericValues.txt, but these canonical equivalents are not given any numeric value neither in UnicodeData.txt nor in DerivedNumericValues.txt. I think canonically equivalent strings should carry the same numeric values, not just informally but also formally in the Unicode database, so the following eight characters should be given appropriate numeric values in UnicodeData.txt and in DerivedNumericValues.txt. F96B;CJK COMPATIBILITY IDEOGRAPH-F96B;Lo;0;L;53C3;;;;N;;;;; 3 F973;CJK COMPATIBILITY IDEOGRAPH-F973;Lo;0;L;62FE;;;;N;;;;; 10 F978;CJK COMPATIBILITY IDEOGRAPH-F978;Lo;0;L;5169;;;;N;;;;; 2 F9B2;CJK COMPATIBILITY IDEOGRAPH-F9B2;Lo;0;L;96F6;;;;N;;;;; 0 F9D1;CJK COMPATIBILITY IDEOGRAPH-F9D1;Lo;0;L;516D;;;;N;;;;; 6 F9D3;CJK COMPATIBILITY IDEOGRAPH-F9D3;Lo;0;L;9678;;;;N;;;;; 6 F9FD;CJK COMPATIBILITY IDEOGRAPH-F9FD;Lo;0;L;4EC0;;;;N;;;;; 10 2F890;CJK COMPATIBILITY IDEOGRAPH-2F890;Lo;0;L;5EFE;;;;N;;;;; 9 ------------------------------------- The CJK telegraph symbols should also be given numeric values in UnicodeData.txt as well as DerivedNumericValues.txt: * for months (32C0-32CB, have compat decomps beginning with digits ending with 6708): values 1-12 * for days (33E0-33FE, have compat decomps beginning with digits ending with 65E5): values 1-31 * for hours (3358-3370, have compat decomps beginning with digits ending with 70B9): values 0-24 Note that parenthesized/circled digits/numbers are given numeric values, and these telegraph symbols should not be handled in principle differently in that regard. ================================================= .