Re: N4106 from Szelp A. Szabolcs on 2011-11-07 (Unicode Mail List Archive)

From: Szelp A. Szabolcs <a.sz.szelp_at_gmail.com>
Date: Mon, 7 Nov 2011 09:18:57 +0100

I welcome that finally the combining parentheses are encoded as such,
and not by precomposed diacritics, especially, as I had evidence for
8-10 additional parenthesed diacritics (of course from linguistic
material, where else from!) to those presented in the earlier
Teuthonista proposals.

I was wondering whether COMBINING DOUBLE PARENTHESES *BELOW* wouldn't
be added for symmetry? (Being fully aware that "analogy" usually is
not taken into account, as e.g. combining small letters above are
still an open set), but I also seem to remember Teuthonista material
showing double parenthesis below in a chart. Is its missing an
editorial oversight, or does my memory trick me?

Concerning the explanatory paragraph dealing with the fused carons, I
have some reservations.
In fact, the current behaviour with carons changing to apostrophes
unconditionally, i.e. without the requirement of langauge codes not
unlike the case of Russian vs. Serbian Cyrillic italics poses some
problems already concerning the encodability of {d, t, l, L}+caron
whith an actual (detached) caron to display. Such representation is
not unheard of. While this can be solved on the font level in high
quality typography, the main question concerning the fused caron is
indeed, whether the fused carons are distinct from the detached carons
themselves.
In fact, most "attached" combining marks are considered as
_letter_modifications_ in Unicode, as far as I see, be it b with right
hook or the IPA symbols. This could warrant the encoding of what is
called d/k/t with attached caron as {D, K, T} WITH SPLIT TOP STEM or
something similar as singular codepoints. Of course, if the mentioned
additional "fused carons" in Landsmålsalfabetet give us letters more
by an order of magnitude, a combining mark solution would become more
desirable.

Szabolcs

--
Szelp, André Szabolcs
+43 (650) 79 22 400
On Sun, Nov 6, 2011 at 22:46, Kent Karlsson <kent.karlsson14_at_telia.com> wrote:
>
> Den 2011-11-05 04:23, skrev "António Martins-Tuválkin" <tuvalkin_at_gmail.com>:
>
>> I'm going through N4106 ( http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4106.pdf
>> ),
> ...
>
> I see the following characters being put forward for proposing to be
> encoded:
>
> 1ABB COMBINING PARENTHESES ABOVE
> 1ABC COMBINING DOUBLE PARENTHESES ABOVE
> 1ABD COMBINING PARENTHESES BELOW
> 1ABE COMBINING PARENTHESES OVERLAY
>
> Well, COMBINING DOUBLE PARENTHESES ABOVE seems to be the same as <COMBINING
> PARENTHESES
> ABOVE, COMBINING PARENTHESES ABOVE>. And COMBINING PARENTHESES OVERLAY seems
> to be just
> a tiny parenthesis before and a tiny parenthesis after; no need for a
> combining mark, especially one with
> a splitting behaviour.
>
> Otherwise, I think COMBINING ((DOUBLE)) PARENTHESES ABOVE/BELOW are an
> entirely new brand of
> characters in Unicode (if accepted as proposed). They are supposed to split
> (ok, we have split
> vowels in some Indic scripts, more on that below), but these split around
> *another combining mark*.
> So despite being given (as proposed) vanilla above/below mark properties,
> they do not "stack" the
> way such characters normally do, but is supposed to invoke an entirely new
> behaviour.
>
> Split vowels are not new, but they split around base characters (or more
> generally, around combining
> sequences), not around (a) combining character(s) only. Indeed, one can
> split these vowels into two
> characters (sometimes by canonical decomposition, when done right; sometime
> by cheating a bit and
> split into another character and the supposedly split vowel character but
> not interpreted as the
> second part of the decomposition; in principle one may need to cheat even
> more and use PUA characters
> in order to do this at the character level, but then that is really bad).
>
> That supposedly stacking combining marks *sometimes* (more a font dependence
> than a character
> dependence) don't stack but instead are laid out linearly is not new. But to
> *require* non-stacking
> behaviour for certain characters is new.
>
> So we have a combination of:
>
> 1. Splitting. (Normally only used for some Indic scripts).
>
> 2. Indeed splitting with no other characters to use for the decomposition,
> thus requiring the use of
>    PUA characters, to stay compliant, for representing the result of the
> split at the character level.
>    (This is entirely new, as far as I can tell.)
>
> 3. The split is entirely *within* the sequence of combining characters
> (except for COMBINING
>    PARENTHESES OVERLAY, which behaves as split vowels normally do, but still
> with issue 2), not
>    around the combining sequence including the base. (This is entirely new.)
>
> 4. Requiring (if at all supported) to use linear layout of combining
> characters instead of stacking.
>    (This is entirely new.)
>
> This makes these proposed characters entirely unique in their display
> behaviour, IMO.
>
> This could be alleviated by encoding COMBINING BEGIN/END PARENTHESIS
> ABOVE/BELOW.
> That way the issues with split, as listed above, can be avoided. There is
> still the issue of requiring
> (when at all supported) linear layout instead of stacking. But at least that
> is a lesser concern.
>
> In summary, I'd propose replacing the four problematic proposed characters
> above with:
>
> COMBINING BEGIN PARENTHESES ABOVE    (or LEFT)
> COMBINING END PARENTHESES ABOVE        (or RIGHT)
>
> COMBINING BEGIN PARENTHESES BELOW    (or LEFT)
> COMBINING END PARENTHESES BELOW        (or RIGHT)
>
> BASELINE SMALL BEGIN PARENTHESES    (or LEFT)
> BASELINE SMALL END PARENTHESES        (or RIGHT)
> (or MODIFIER LETTER instead of BASELINE; the latter two are not combining)
>
>     /Kent K
>

Received on Mon Nov 07 2011 - 02:21:59 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 07 2011 - 02:22:00 CST