Re: Writing Babylonian Numbers in Unicode

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 1 May 2012 22:34:52 +0100

On Mon, 30 Apr 2012 16:42:51 -0700
Ken Whistler <kenw_at_sybase.com> wrote:

> On 4/30/2012 3:33 PM, Richard Wordingham wrote:
> >> One is not compelled to construct U+3039 (〹) ,twenty' from two
> >> U+3038
> >> > (〸) ,ten', so a CUNEIFORM TWO U may well be missing.
> > It looks as though it is.
>
> No, it isn't.
>
> > It was present in Proposal N2664
> > (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2664), as CUNEIFORM NUMERIC
> > SIGN NISH, but is missing from the next revision, Proposal N2698
> > (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2698). Between these two,
> > the sign for '30' changed from CUNEIFORM NUMERIC SIGN USHU2 to
> > CUNEIFORM SIGN U U U. It could be an accidental omission of *SIGN
> > TWO U/SIGN MAN
> > - the Unicode Cuneiform list does not appear to have been archived,
> > so I can't work out why it should have been deliberately removed.
>
> The document you are looking for is "Rationale for changes to N2664R",
> by Steve Tinney. L2/04-080.

Thanks - I don't know how I missed that one when I looked through the
registers. Unfortunately, it doesn't explicitly state that CUNEIFORM
NUMERIC SIGN NISH had been removed. I wonder how CUNEIFORM SIGN MIN
escaped the same fate!

Assuming that NUMERIC SIGN NISH was removed as being U + U, I
am still little the wiser as to what the equivalent coding one should
use is. The sequence <SIGN U, SIGN U> does not kern tightly, and <SIGN
U, SIGN U, SIGN U> looks nothing like <SIGN U U U>. In one there are
clear gaps between the impressions, whereas in the other the
impressions touch. I've checked some drawings and they tend to show a
clear gap between an U impression at the end of one symbol and an U
impression at the start of the next symbol.

The note L2/04-080 recommends the sequence <SIGN U, CGJ, SIGN U>, but it
is not clear how that should help. Referring to
http://www.unicode.org/faq/char_combmark.html, I read the following
questions (Q) and answers (A), to which I must regretfully add some
remarks (R).

1. Q: Does U+034F COMBINING GRAPHEME JOINER affect display of combining
character sequences?

A: No. <snip> It does not impact cursive joining or ligation (contrast
U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER).

R: It does however affect how combining characters combine - TUS6.1
Section 7.9 gives its effect between a ligature tie and a single
diacritic, and TUS Section 16.2 gives an example of its effects on
Hebrew marks and accents. (The next answer but one actually gives
the Hebrew example!)

R: I'm confused by the display of <a,CGJ,umlaut>. Is a Fraktur font
that, in its normal setting, displays it differently from <a,umlaut> in
breach of the Unicode standard?

2. Q: Does U+034F COMBINING GRAPHEME JOINER join graphemes?

A: No. <snip> It has no impact on line breaking, except that as for
other combining marks, it should be kept with its base when breaking a
line.

R: According to http://www.unicode.org/Public/UNIDATA/LineBreak.txt,
CGJ is of linebreak class GL. Class GL prohibits line breaks
immediately before or afterwards, unless the preceding character is a
space or ZWSP.

I'd be inclined to go for <SIGN U, ZWJ, SIGN U>, but for one little
nagging worry.

In the final proposal for Cuneiform, it was proposed that the Cuneiform
symbols have line-breaking class ID. This is quite reasonable for text
that is written without inter-word gaps, but some text does have
inter-word gaps, and this text benefits from the current line-breaking
class of AL (ordinary alphabetical and symbol characters). AL v. ID is
tailorable, and, curiously, ZWJ has no effect on line breaking.

I have a natural preference for WORD JOINER over CGJ (and it can be
easier to enter in some word-processors). Also, when normal
line-breaking fails, as CGJ between canonical combining class 0
characters frequently marks a syllable boundary, I fear it might be a
magnet for emergency line-breaking. So, please, which of the following
is suitable, and which is better:

<SIGN U, CGJ, ZWJ, SIGN U>
<SIGN U, WJ, ZWJ, SIGN U>
<SIGN U, ZWJ, WJ, SIGN U>

Richard.
Received on Tue May 01 2012 - 16:36:41 CDT

This archive was generated by hypermail 2.2.0 : Tue May 01 2012 - 16:36:41 CDT