Re: Engmagate? from Philippe Verdy on 2013-12-12 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 12 Dec 2013 17:46:59 +0100

However you should have noted that this link just explains why the charts
cannot represent all possible shapes of a character. It exposes some cases
(here we are in a situation exactly similar to the variant shapes of italic
Cyrillic letter pe, with prefered form very different between russian and
Serbian).

This small section in the standard is not enough. Notably it mixes several
very distinct cases:
- contextual shapes in Arabic are part of a separate normative
specification in TUS for joining types.
- contextual shapes adopted within specific *sequences* (e.g. in Indic
scripts, or alternative shapes that *should* be adopted when a base letter
is followed by some combining diacritics.)

The second set of variants would merit further normalization, notably for
Indic scripts. For now we can only find the relevant data outside TUS, for
example in OpenType specifications. This is not enough in my opinion,
because OpenType is not the only possible implementation used of the
standard, and the way OpenType defines this is also not normative and
technically too far from the need: we need a clear and normalized
specification to know which *sequences* are expected (in various languages
or more generally in some scripts independantly of the language), notably
for sequences involving joiner controls. These issues are too superficially
covered in TUS chapters describing some scripts.

TUS has only standardized a few sequences by assigning normative names, but
without assigning them at least informative representative glyphs for these
sequences. and nothing has been done to exhibit representative glyphs
expected in some languages (e.g. the Serbian Cyrillic italic small letter
pe).

We should think about extending the standard by starting by one or several
technical reports, which will later become part of the standard or
integrated in the relevant chapters for each script, and with an add-on
after the charts showing only the representative glyphs for isolated
characters (which is just the "most common" shape expected in all
languages. Such specifications should expose the best practices that are
expected, with many of them that should be come normative (even of there
will still be free space for variation, in the limits where differences
between the standardized sets of representative glyphs will be preserved).

However it requires that these add-on charts contain more identification
than just a single code point, as it shoud include other selectors :
language, style options like italic, sequences of code points, possibly
even a reference to some regular expression or similar (collating
element?) to infer the correct set of acceptable shapes.

2013/12/12 Leo Broukhis <leob_at_mailcom.com>

> Hasn't http://www.unicode.org/standard/where/#Variant_Shapes explained it
> once and for all?
>
> Leo
>
>
> On Thu, Dec 12, 2013 at 4:42 AM, <dzo_at_bisharat.net> wrote:
>
>> FWIW, a blog post prompted by discussions in the wake of a DejaVu font
>> use of N-form over n-form capital ŋ ("eng" or "engma"):
>>
>> "The 'eng' times for unified capital ŋ?"
>> http://niamey.blogspot.com/2013/12/the-eng-times-for-unified-capital.html
>>
>> It's not a new issue, but was leaving the two main forms of capital eng
>> as variants of one character the best course of action? In any event, it's
>> probably more complex to disunify now (if that were to be decided) than it
>> would have been, say, 10-12 years ago.
>>
>> Don Osborn
>>
>> Sent via BlackBerry by AT&T
>>
>>
>>
>
Received on Thu Dec 12 2013 - 10:49:13 CST

This archive was generated by hypermail 2.2.0 : Thu Dec 12 2013 - 10:49:14 CST