Re: Combining latin small letters with diacritics from Philippe Verdy on 2012-03-05 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 5 Mar 2012 21:51:32 +0100

You are so much attached to keep the existing encoding model
unchanged, that now you are going to prepare for LOTS of additions of
combining Latin characters with diacritics... The BMP won't be enough,
the SMP will fill up too, and there will be enormous problems for font
creators (or complications added to text renderers, to override the
limitation of fonts, as they will start decomposing those combining
characters to find matching base characters that they will stack
themselves, using the existing glyph properties of base characters).

I don't see why a more productive model would not be used, even if
there exists already encoded characters that have avoided using such
extension mechanism (that font creators would implement much more
simply and more generally).

If you look at the document cited by Denis, these letters (with or
without additional diactitics) used above other letters are really
keeping their identity and semantic. The presentation used (vertical
stacks) just means that the base letter and the upper letter are
defining a range of intermediate pronunciations between the
pronuncuations meant by each letter in the stack. They remain
candidates for an upper layer mechanism to create those vertical
stacks (just like there is an upper layer mechanism to create
superscripts and subscripts).

Le 5 mars 2012 21:17, Ken Whistler <kenw_at_sybase.com> a écrit :
> On 3/5/2012 11:44 AM, Philippe Verdy wrote:
>>
>> So what do you propose ?
>
>
> It doesn't matter what *Michael* proposes at this point. These have already
> been approved by both the UTC and WG2 and are currently in DAM ballot.
>
>
>> - Encoding the new precomposed pairs as a new combining character
>> (there may be a lot of candidate pairs to encode, espacially in the
>> Latin script),
>
>
> Yes. Although this isn't a "precomposed pair", by definition. It is a letter
> with
> a diacritic of some sort (any sort), which itself is then used as a
> combining mark
> above.
>
>
>> - or encoding a variation of the existing diacritic to mean that they
>> are bound to a first-level of diacritic (here a combining letter),
>
>
> No. That would be a fundamental architectural change to the standard. Ain't
> gonna happen.
>
>
>> - or duplicating the encoding of the diacritics without using varation
>> selectors ?
>
>
> No.
>
>
>> - or using an upper layer protocol ?
>
>
> No.
>
> By the way, Philippe, this horse is already long out of the barn. See U+1DD7
> COMBINING LATIN SMALL LETTER C WITH CEDILLA, which is already a
> published part of the standard.
>
> Focusing just on the three new characters with umlauts (or diaereses --
> makes
> no matter, you can use for either, just like the non-combining versions) --
> seems
> to make this a matter of what happens when you have a combining letter above
> which has its own diacritic above, but in fact this is a much more general
> problem,
> because the diacritics on the combining letter above could be below (see
> the C WITH CEDILLA cited above) or otherwise, just as well. See 1DEC, which
> has a diacritic set of bars *across* the letter form, and 1DED and 1DF0,
> which have
> a diacritic mark at the bottom left of the letter forms.
>
> --Ken
>
Received on Mon Mar 05 2012 - 14:53:54 CST

This archive was generated by hypermail 2.2.0 : Mon Mar 05 2012 - 14:53:55 CST