L2/06-309R

Source: Kent Karlsson
Date: November 7, 2006

Please update my previous report (L2/06-309 Bug in DerivedNumericValues.txt)
on numeric value for F9B2 with the following more extensive report.

======================================================================

The following two characters are not given numeric values in DerivedNumericValues.txt:

6F06; ?;    7	CJK UNIFIED IDEOGRAPH-6F06	jp-obsolete-financial
9621; ?; 1000	CJK UNIFIED IDEOGRAPH-9621	jp-obsolete-financial
				(both according to http://en.wikipedia.org/wiki/Japanese_numerals#Formal_numbers)

Wikipedia gives numeric values to these two CJK characters. I have no independent confirmation though.



Wikipedia also gives very large (over one trillion) numeric values
to certain CJK characters. At least some of these may be interesting
to give numeric values to in DerivedNumericValues.txt (derived from
Unihan.txt, so the possible omission is there), especially since some
of them are also used as translations for SI prefixes.


The following eight characters are each canonically equivalent with a
CJK character that is given a numeric value in DerivedNumericValues.txt,
but these canonical equivalents are not given any numeric value neither
in UnicodeData.txt nor in DerivedNumericValues.txt. I think canonically
equivalent strings should carry the same numeric values, not just
informally but also formally in the Unicode database, so the following
eight characters should be given appropriate numeric values in
UnicodeData.txt and consequently in DerivedNumericValues.txt.

F96B;CJK COMPATIBILITY IDEOGRAPH-F96B;Lo;0;L;53C3;;;;N;;;;;    3
F973;CJK COMPATIBILITY IDEOGRAPH-F973;Lo;0;L;62FE;;;;N;;;;;   10
F978;CJK COMPATIBILITY IDEOGRAPH-F978;Lo;0;L;5169;;;;N;;;;;    2
F9B2;CJK COMPATIBILITY IDEOGRAPH-F9B2;Lo;0;L;96F6;;;;N;;;;;    0
F9D1;CJK COMPATIBILITY IDEOGRAPH-F9D1;Lo;0;L;516D;;;;N;;;;;    6
F9D3;CJK COMPATIBILITY IDEOGRAPH-F9D3;Lo;0;L;9678;;;;N;;;;;    6
F9FD;CJK COMPATIBILITY IDEOGRAPH-F9FD;Lo;0;L;4EC0;;;;N;;;;;   10
2F890;CJK COMPATIBILITY IDEOGRAPH-2F890;Lo;0;L;5EFE;;;;N;;;;;  9




The CJK telegraph symbols should also be given numeric values in
UnicodeData.txt as well as DerivedNumericValues.txt, as e.g.
parenthesized digits/numbers are:

* for months (32C0-32CB, have compat decomps begining with digits
  ending with 6708): values 1-12

* for days (33E0-33FE, have compat decomps begining with digits
  ending with 65E5): values 1-31

* for hours (3358-3370, have compat decomps begining with digits
  ending with 70B9): values 0-24




===================================================================