[hebrew] Re: variation selectors for combining characters (was: Hebrew composition model, with cantillation marks)

From: Philippe Verdy ([email protected])
Date: Tue Nov 04 2003 - 10:14:27 EST

Next message: Jill Ramonsky: "RE: UTF-16 inside UTF-8"

Previous message: David E. Hollingsworth: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Philippe Verdy" <[email protected]>
> All that can be done is to create a new variation selector for combining
> characters. It could be created:
> - either within a new generic set of variation selectors for combining
> characters (noted CVSn here) to produce sequences like <HEBREW POINT
> METEG><CVSn>;
> - or as Hebrew specific variation selectors for Hebrew combining
> characters (noted HVSn here); this would produce sequences like <HEBREW
> POINT METEG><HEBREW HVSn> which should be treated at <HEBREW POINT METEG>
by
> renderers or collators that do not implement this variation selector.
>
> In either case, such types of variation selector sequences needed to
> override the rendered position of the previous combining character should
be
> allowed only for registered sequences, like with other base characters
with
> known variants.

I forgot also the problem caused by the normalization of combining sequences
which would include such variation selector sequences. Such sequence would
need to be stable across normalization and should be treated equivalently in
all canonically equivalent order.

Suppose that other diacritics are coded before METEG:
<BASE(cc=0)><diacritic(cc=x)><METEG(cc=y)><HVS (cc=0)>
- if x > y, then normalization will reorder it to:
<BASE(cc=0)><METEG(cc=y)><diacritic(cc=x)><HVS (cc=0)>
and so the HVS will not work as expected to create a glyph variant of the
METEG, but of the other diacritic.
- if x < y, then normalization will keep the order, but the second sequence
will still be canonically equivalent to the first one.

So what would be needed is a set of variation selectors for each possible
combining class value, so that a CVSn character can remain stable and
attached to the right combining character in all canonically equivalent
strings.

So this would require encoding the new variation selector with the SAME
combining class as the one for METEG to which it applies.

Of course we have plenty of space in special plane 14 to allocate them. But
this decision is architectural (and could be used also as a easy way to
extend other scripts, for example to represent variations of Latin accents,
like the presentation of the cedilla/comma-above, or the rounded/angular
form of the circumflex, or the 9-shaped/stroke-shaped appearance of the
accute accent, if this ever has some distinctful meaning in a multilanguage
environment).

The other question is how many selectors will be needed: we have 256
selectors for base characters, will we need 256 selectors for each possible
combining class except class 0 (this would nearly fill a complete plane)?

If such choice is not made, I don't see the interest of encoding a variation
selector for Hebrew, and in fact it may be much more simple to encode a new
<HEBREW POINT MEDIAL METEG> combining character (maybe with a compatibility
decomposition to <HEBREW POINT METEG> if this helps producing at least a
approximate rendering on legacy renderers).

Next message: Jill Ramonsky: "RE: UTF-16 inside UTF-8"
Previous message: David E. Hollingsworth: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 10:59:44 EST