Re: N4106 from Kent Karlsson on 2011-11-06 (Unicode Mail List Archive)

From: Kent Karlsson <kent.karlsson14_at_telia.com>
Date: Sun, 06 Nov 2011 23:46:47 +0200

Den 2011-11-05 04:23, skrev "António Martins-Tuválkin" <tuvalkin_at_gmail.com>:

> I'm going through N4106 ( http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4106.pdf ),
...

I see the following characters being put forward for proposing to be
encoded:

1ABB COMBINING PARENTHESES ABOVE
1ABC COMBINING DOUBLE PARENTHESES ABOVE
1ABD COMBINING PARENTHESES BELOW
1ABE COMBINING PARENTHESES OVERLAY

Well, COMBINING DOUBLE PARENTHESES ABOVE seems to be the same as <COMBINING
PARENTHESES
ABOVE, COMBINING PARENTHESES ABOVE>. And COMBINING PARENTHESES OVERLAY seems
to be just
a tiny parenthesis before and a tiny parenthesis after; no need for a
combining mark, especially one with
a splitting behaviour.

Otherwise, I think COMBINING ((DOUBLE)) PARENTHESES ABOVE/BELOW are an
entirely new brand of
characters in Unicode (if accepted as proposed). They are supposed to split
(ok, we have split
vowels in some Indic scripts, more on that below), but these split around
*another combining mark*.
So despite being given (as proposed) vanilla above/below mark properties,
they do not "stack" the
way such characters normally do, but is supposed to invoke an entirely new
behaviour.

Split vowels are not new, but they split around base characters (or more
generally, around combining
sequences), not around (a) combining character(s) only. Indeed, one can
split these vowels into two
characters (sometimes by canonical decomposition, when done right; sometime
by cheating a bit and
split into another character and the supposedly split vowel character but
not interpreted as the
second part of the decomposition; in principle one may need to cheat even
more and use PUA characters
in order to do this at the character level, but then that is really bad).

That supposedly stacking combining marks *sometimes* (more a font dependence
than a character
dependence) don't stack but instead are laid out linearly is not new. But to
*require* non-stacking
behaviour for certain characters is new.

So we have a combination of:

1. Splitting. (Normally only used for some Indic scripts).

2. Indeed splitting with no other characters to use for the decomposition,
thus requiring the use of
PUA characters, to stay compliant, for representing the result of the
split at the character level.
(This is entirely new, as far as I can tell.)

3. The split is entirely *within* the sequence of combining characters
(except for COMBINING
PARENTHESES OVERLAY, which behaves as split vowels normally do, but still
with issue 2), not
around the combining sequence including the base. (This is entirely new.)

4. Requiring (if at all supported) to use linear layout of combining
characters instead of stacking.
(This is entirely new.)

This makes these proposed characters entirely unique in their display
behaviour, IMO.

This could be alleviated by encoding COMBINING BEGIN/END PARENTHESIS
ABOVE/BELOW.
That way the issues with split, as listed above, can be avoided. There is
still the issue of requiring
(when at all supported) linear layout instead of stacking. But at least that
is a lesser concern.

In summary, I'd propose replacing the four problematic proposed characters
above with:

COMBINING BEGIN PARENTHESES ABOVE (or LEFT)
COMBINING END PARENTHESES ABOVE (or RIGHT)

COMBINING BEGIN PARENTHESES BELOW (or LEFT)
COMBINING END PARENTHESES BELOW (or RIGHT)

BASELINE SMALL BEGIN PARENTHESES (or LEFT)
BASELINE SMALL END PARENTHESES (or RIGHT)
(or MODIFIER LETTER instead of BASELINE; the latter two are not combining)

/Kent K
Received on Sun Nov 06 2011 - 16:55:08 CST

This archive was generated by hypermail 2.2.0 : Sun Nov 06 2011 - 16:55:15 CST