L2/12-280

   Comments on L2/12-268

   Kent Karlsson
    2012-07-26


In L2/12-268 Richard Wordingham writes (parts in quotes, my comments outside of quotes):

"... When the numeric values were [recently, not in the UCD yet] corrected for
U+1240F CUNEIFORM NUMERIC SIGN FOUR U to U+12414 CUNEIFORM NUMERIC SIGN NINE U (from
4 to 9 to 40 to 90), this stopped them being collated as secondary variants of positional
decimal digits."

This is all well and good, as part of the correction.


"Only U+1240F and U+12410 can be considered sexagesimal digits. This unplanned change to
collation was reversed by modifying the sifter program itself.

Ken Whistler has formally proposed that this modification to sifter be removed by reversing
the correction to UnicodeData.txt.  Would not a better approach be for UnicodeData.txt to be
correct and keep the incorrect values, possibly with a tag, in the file used by sifter?"

Or even better, let the correction have the effect it should on the DUCET. There are more of
these corrections to come, as Richard W. points out, and no other non-"0,...,9" digits get
collated as if they were. Indeed, I think the DUCET itself should be restricted to do this
special digit handling only for Nd digits (Unicode "digit"s), not digits more generally.

And indeed, much bigger changes to the DUCET are planned...

(Ideally, all non-spelled-out non-alphabetic numerals should be sorted in numerical order,
and indeed ICU has an option to partially do so. I don't see it covering, in the short run,
numerals other than decimal ones formed by Nd digits, though, and with the decimal
and group separators of the current locale(?). And with things like section "14.4" not being
a single number stirring things up, for many locales...)


"1) Of the cuneiform numbers and punctuation, I can confirm that only the members of the
DISH, ASH and ASH TENU series truly have the values in the range 1 to 9. The DISH series are
sexagesimal digits, not mere numbers, so I believe they should have numeric type "digit"."

They cannot be Nd because they do not form 0-9 coded sequentially (which is a requirement
for Nd-ness). They can, and should, be No, though. But that goes for all of the Nl Cuneiform
number characters.


"The other series have other values, being multiples of 10, 60, 600, 3,600, 36,000 or 216,000.  
This remark also applies to the following six numbers proposed in L2/12-207 (a.k.a. ISO/IEC
JTC1/SC2/WG2/N4277)."

I agree, on both counts. And I am working on a proposal to fix those too. In the meantime,
here is my draft list (except for those in N4277):

𒐕 12415;CUNEIFORM NUMERIC SIGN ONE GESH2;No;0;L;;;;60;N;;;;; // instead of <1, 0> in base 60
𒐖 12416;CUNEIFORM NUMERIC SIGN TWO GESH2;No;0;L;;;;120;N;;;;; // instead of <2, 0> in base 60
𒐗 12417;CUNEIFORM NUMERIC SIGN THREE GESH2;No;0;L;;;;180;N;;;;;
𒐘 12418;CUNEIFORM NUMERIC SIGN FOUR GESH2;No;0;L;;;;240;N;;;;;
𒐙 12419;CUNEIFORM NUMERIC SIGN FIVE GESH2;No;0;L;;;;300;N;;;;;
𒐚 1241A;CUNEIFORM NUMERIC SIGN SIX GESH2;No;0;L;;;;360;N;;;;;
𒐛 1241B;CUNEIFORM NUMERIC SIGN SEVEN GESH2;No;0;L;;;;420;N;;;;;
𒐜 1241C;CUNEIFORM NUMERIC SIGN EIGHT GESH2;No;0;L;;;;480;N;;;;;
𒐝 1241D;CUNEIFORM NUMERIC SIGN NINE GESH2;No;0;L;;;;540;N;;;;;

𒐞 1241E;CUNEIFORM NUMERIC SIGN ONE GESHU;No;0;L;;;;600;N;;;;; // instead of <10, 0> in base 60
𒐟 1241F;CUNEIFORM NUMERIC SIGN TWO GESHU;No;0;L;;;;1200;N;;;;; // instead of <20, 0> in base 60
𒐠 12420;CUNEIFORM NUMERIC SIGN THREE GESHU;No;0;L;;;;1800;N;;;;;
𒐡 12421;CUNEIFORM NUMERIC SIGN FOUR GESHU;No;0;L;;;;2400;N;;;;;
𒐢 12422;CUNEIFORM NUMERIC SIGN FIVE GESHU;No;0;L;;;;3000;N;;;;;

  xxxxx;CUNEIFORM NUMERIC SIGN ONE SHAR2;No;0;L;;;;3600;N;;;;; // 1*60*60, i.e. <1, 0, 0> in base 60, NEW, TO BE PROPOSED, disunify from 1212D;CUNEIFORM SIGN HI
𒐣 12423;CUNEIFORM NUMERIC SIGN TWO SHAR2;No;0;L;;;;7200;N;;;;; // 2*60*60, i.e. <2, 0, 0> in base 60
𒐤 12424;CUNEIFORM NUMERIC SIGN THREE SHAR2;No;0;L;;;;10800;N;;;;;
𒐥 12425;CUNEIFORM NUMERIC SIGN THREE SHAR2 VARIANT FORM;No;0;L;;;;10800;N;;;;;
𒐦 12426;CUNEIFORM NUMERIC SIGN FOUR SHAR2;No;0;L;;;;14400;N;;;;;
𒐧 12427;CUNEIFORM NUMERIC SIGN FIVE SHAR2;No;0;L;;;;18000;N;;;;;
𒐨 12428;CUNEIFORM NUMERIC SIGN SIX SHAR2;No;0;L;;;;21600;N;;;;;
𒐩 12429;CUNEIFORM NUMERIC SIGN SEVEN SHAR2;No;0;L;;;;25200;N;;;;;
𒐪 1242A;CUNEIFORM NUMERIC SIGN EIGHT SHAR2;No;0;L;;;;28800;N;;;;;
𒐫 1242B;CUNEIFORM NUMERIC SIGN NINE SHAR2;No;0;L;;;;32400;N;;;;;

𒐬 1242C;CUNEIFORM NUMERIC SIGN ONE SHARU;No;0;L;;;;36000;N;;;;; // i.e. <10, 0, 0> in base 60
𒐭 1242D;CUNEIFORM NUMERIC SIGN TWO SHARU;No;0;L;;;;72000;N;;;;;
𒐮 1242E;CUNEIFORM NUMERIC SIGN THREE SHARU;No;0;L;;;;108000;N;;;;;
𒐯 1242F;CUNEIFORM NUMERIC SIGN THREE SHARU VARIANT FORM;No;0;L;;;;108000;N;;;;;
𒐰 12430;CUNEIFORM NUMERIC SIGN FOUR SHARU;No;0;L;;;;144000;N;;;;;
𒐱 12431;CUNEIFORM NUMERIC SIGN FIVE SHARU;No;0;L;;;;180000;N;;;;;
--
𒐲 12432;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;No;0;L;;;;216000;N;;;;; // <1, 0, 0, 0> in base 60
𒐳 12433;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;No;0;L;;;;432000;N;;;;; // <2, 0, 0, 0> in base 60

That leaves a few Sumero-Akkadian Cuneiform number characters that I haven't figured out
the values for.


Another problem is that several Sumero-Akkadian Cuneiform digits are unified with Sumero-Akkadian
Cuneiform letters. And the letters have other properties than the digits should have. I'm working on a
proposal to disunify the Cuneiform digits from Cuneiform letters. My current draft list is as follows:

𒀹 xxxxx;CUNEIFORM NUMERIC SIGN ONE ASH TENU;No;0;L;;;;1;N;;;;; // disunify from 12039;CUNEIFORM SIGN ASH ZIDA TENU

𒀸 xxxxx;CUNEIFORM NUMERIC SIGN ONE ASH;No;0;L;;;;1;N;;;;; // disunify from 2038;CUNEIFORM SIGN ASH
𒋰 xxxxx;CUNEIFORM NUMERIC SIGN TWO ASH VARIANT FORM;No;0;L;;;;2;N;;;;; // disunify from 122F0;CUNEIFORM SIGN TAB
  yyyyy;CUNEIFORM NUMERIC SIGN FOUR ASH VARIANT FORM A;No;0;L;;;;4;N;;;;; // like 1243C but horizontal
  yyyyy;CUNEIFORM NUMERIC SIGN FOUR ASH VARIANT FORM B;No;0;L;;;;4;N;;;;; // like 144BE but horizontal
𒄿 xxxxx;CUNEIFORM NUMERIC SIGN FIVE ASH VARIANT FORM;No;0;L;;;;5;N;;;;; // disunify from 1213F;CUNEIFORM SIGN I; 3,2
  yyyyy;CUNEIFORM NUMERIC SIGN FIVE ASH VARIANT FORM A;No;0;L;;;;5;N;;;;; // 2,2,1
  yyyyy;CUNEIFORM NUMERIC SIGN SEVEN VARIANT FORM;No;0;L;;;;7;N;;;;; // like 12442 but horizontal

𒁹 xxxxx;CUNEIFORM NUMERIC SIGN ONE DISH;No;0;L;;;;1;N;;;;; // disunify from 12079;CUNEIFORM SIGN DISH
𒈫 xxxxx;CUNEIFORM NUMERIC SIGN TWO DISH;No;0;L;;;;2;N;;;;; // disunify from 1222B;CUNEIFORM SIGN MIN
  yyyyy;CUNEIFORM NUMERIC SIGN FIVE DISH VARIANT FORM A;;No;0;L;;;;5;N;;;;; // 2s+3s
  yyyyy;CUNEIFORM NUMERIC SIGN FIVE DISH VARIANT FORM B;;No;0;L;;;;5;N;;;;; // like 12403 but vertical (2s+2s+1)
  yyyyy;CUNEIFORM NUMERIC SIGN SIX DISH VARIANT FORM A;;No;0;L;;;;6;N;;;;; // like 12404 but vertical

𒌋 xxxxx;CUNEIFORM NUMERIC SIGN ONE U;No;0;L;;;;10;N;;;;; // disunify from 1230B;CUNEIFORM SIGN U
  xxxxx;CUNEIFORM NUMERIC SIGN TWO U;No;0;L;;;;20;N;;;;; // do not unify with U U (proposed in L2/12-207: 12399;CUNEIFORM SIGN U U;Lo;0;L;;;;;N;;;;;)
𒑱 yyyyy;CUNEIFORM NUMERIC SIGN TWO U VARIANT FORM;No;0;L;;;;20;N;;;;; // disunify from 12471, CUNEIFORM PUNCTUATION SIGN VERTICAL COLON
𒌍 xxxxx;CUNEIFORM NUMERIC SIGN THREE U;No;0;L;;;;30;N;;;;; // disunify from 1230D;CUNEIFORM SIGN U U U
  yyyyy;CUNEIFORM NUMERIC SIGN THREE U VARIANT FORM;No;0;L;;;;30;N;;;;; //

  xxxxx;CUNEIFORM NUMERIC SIGN ONE SHAR2;No;0;L;;;;3600;N;;;;; // 1*60*60, i.e. <1, 0, 0> in base 60, disunify from 1212D;CUNEIFORM SIGN HI

10 disunifications, and 9 for completing digit design styles (which are already disunified
within the set of Cuneiform digits, but incomplete).

I think
𒑉 12449;CUNEIFORM NUMERIC SIGN NINE VARIANT FORM ILIMMU A;No;0;L;;;;9;N;;;;;
should be
𒑉 12449;CUNEIFORM NUMERIC SIGN NINE VARIANT FORM ILIMMU A;No;0;L;;;;3;N;;;;;
i.e. 3 rather than 9 despite the name.

I also think there should be a zero digit (used as a filler digit only, the concept of zero was
not invented at the time), as there was a zero digit used instead of just space (or even ambiguity).


"2) U+1D369 COUNTING ROD TENS DIGIT ONE to U+1D371 COUNTING ROD TENS DIGIT NINE are digits
in a decimal place value system, so they should have numeric type "digit" and values 1 to 9."

They cannot be Nd, since the zero digit is coded elsewhere, as well as other complications.

B.t.w. see also CLDR ticket 4473, http://unicode.org/cldr/trac/ticket/4473, which gives RBNF
rule sets for handling counting rod numerals, as well as some other Chinese numeral systems
not already covered by CLDR RBNF rule sets.


"3) The alternating between digit sets is also seen in the Telugu fraction digits, U+0C78 to U+0C7E.
It is not clear to me why these have numeric type "numeric" rather than "digit"."

They are not 0-9 encoded sequentially, and therefore cannot be Nd (a.k.a. "digit" which is a
confusing alias).


"4) U+3021 HANGZHOU NUMERAL ONE to U+3029 HANGZHOU NUMERAL NINE are digits in a decimal
place value system, so they should have numeric type "digit"."

Same as for Counting rods, these cannot be Nd (digit), for the same reasons. Again see CLDR ticket 4473.

-------------------