Re: N4106 from Kent Karlsson on 2011-11-07 (Unicode Mail List Archive)

From: Kent Karlsson <kent.karlsson14_at_telia.com>
Date: Mon, 07 Nov 2011 15:31:13 +0200

Den 2011-11-07 10:34, skrev "vanisaac_at_boil.afraid.org"
<vanisaac_at_boil.afraid.org>:

>> So despite being given (as proposed) vanilla above/below mark properties,
>> they do not "stack" the
>> way such characters normally do, but is supposed to invoke an entirely new
>> behaviour.
>
> I agree, except that if we give them any but a ccc=220/230, then canonical
> reordering will separate them from the modifier letters that they are attached

Nit: modifier letters (as that term is used in Unicode) are not combining
marks; here you mean combining marks.

> to. I think this is one of those cases where a definition needs to expand in
> order to accommodate architecture. We do already have some non-stacking
> behaviour defined for these characters in order to accommodate polytonic
> Greek,
> so we do have some experience with disparate appearances of consecutive marks.

Yes, but that they have special behavior needs to be made explicit.

>> That supposedly stacking combining marks *sometimes* (more a font dependence
>> than a character
>> dependence) don't stack but instead are laid out linearly is not new. But to
>> *require* non-stacking
>> behaviour for certain characters is new.
>
> Then think of it as the "non-spacing" version of stacking behaviour.

Would not be sufficient. See below.

>> So we have a combination of:
>>
>> 1. Splitting. (Normally only used for some Indic scripts).
>>
>> 2. Indeed splitting with no other characters to use for the decomposition,
>> thus requiring the use of
>> PUA characters, to stay compliant, for representing the result of the
>> split at the character level.
>> (This is entirely new, as far as I can tell.)
>
> I cannot imagine in any way how this requires PUA characters.

Splitting is usually done at the character level... I know, some say that
this should always be done at the glyph level (somehow), but IIUC that is
not so in practice. And I think it is preferable to do it at the character
level, so that is not just handwaved away (oh, the font should do this...)
leaving it up to each and every font designer to do this odd-ball extra
(and thus won't be done most of the time, even if the font framework may
support it). Laying out linearly instead of stacking is quite enough
odd-ball extra.

>> 3. The split is entirely *within* the sequence of combining characters
>> (except for COMBINING
>> PARENTHESES OVERLAY, which behaves as split vowels normally do, but still
>> with issue 2), not
>> around the combining sequence including the base. (This is entirely new.)
>>
>> 4. Requiring (if at all supported) to use linear layout of combining
>> characters instead of stacking.
>> (This is entirely new.)
>
> If I were designing a font, I would simply make the in/out mark attachment
> point near the top/middle of the parentheses, so that it drops down around the
> "base" mark, and then attaches any subsequent marks as if the parentheses
> weren't there. I think you're making this too complicated.

But glyphs for combining marks may be of different widths, for example a
(glyph for a) dot below is much narrower than a (proposed) wiggly line
below. Or, consider LENIS MARK and DOUBLE LENIS MARK (both for Teuthonista,
and both apparently used together with parentheses). The usual, and general,
way of handling that is to actually split the
character-that-goes-on-both-sides of something that may have different
widths in different instances. Of course you also need width info for
combining marks. I would still consider splitting to be a needless
complication here, and instead encode begin/end pairs of combining
parentheses instead of what is in N4106.

>
>> This makes these proposed characters entirely unique in their display
>> behaviour, IMO.
>
> I do, however, agree totally with this assessment, I just believe it is more
> manageable than you paint it.
>
> [snip]
>> /Kent K
>
> I do, myself, have a couple of concerns in regards to several proposed
> characters in N4106 as well. Namely, I believe that U+1DF2, U+1DF3, and U+1DF4
> should require significant justification as to why they should not be encoded
> as U+0363 + U+0308, U+0366 + U+0308, and U+0367 + U+0308.

There is the issue of whether the diaeresis applies to the base letter (plus
something) or if it applies to the combining mark just under the diaeresis.

/Kent K

> I have similar
> concerns about U+A799, U+AB30, U+AB33, U+AB38, U+AB3E, U+AB3F, etc.
>
> Van A
>
>
Received on Mon Nov 07 2011 - 08:39:20 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 07 2011 - 08:39:37 CST