Re: Combining latin small letters with diacritics from Philippe Verdy on 2012-03-11 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 12 Mar 2012 01:17:00 +0100

Another example: suppose you want to represent the epigraphic notation
where there's a tie grouping several orthographic characters, for use
in texts discussing grammar. You can perfectly use the special
combining character with class 0 that I propose to annotate :

- the first orthographic character (or cluster) with the standard half
diacritic encoding the left part of the tie : encode that half
diacritic after that special character

- the second orthographic character (or cluster) with the standard
half diacritic encoding the right part of the tie : encode that half
diacritic after that special character

- use the SAME special combining character (if there's a set of such
character) to indicate that both notations are associated. This will
give hints to renderers that they can safely join the two half parts !

Le 12 mars 2012 00:56, Philippe Verdy <verdy_p_at_wanadoo.fr> a écrit :
> One example: say you want to encode an epigraphic C with CEDILLA
> appearing as a letter above another one, you would encode :
>
> - (1) the orthographic base letter (with its standard diacritics,
> including CGJ if needed)
> - (2) the new special combining character with combining class 0 that I propose.
> - (3) the existing combiing letter C
> - (4) the existing combiing CEDILLA (or other existing diacritics,
> including CGJ if needed to avoid reorderings by normalizers).
>
> Renderers have hints given by the character (2) that they must not
> reorder/mix/compose randomly the characters between parts (1) and (3,
> 4). But they also have the hint that they can precompose safely the
> characters in (3, 4) without breaking anything, And they don't have to
> represent the character (2) itself (they could do it, still, using
> some other layout mechanisms).
>
> Semantic analysers know how to intepret characters in (3, 4) together,
> with their semantic level associated by them for the special character
> (2)
>
> Ortographic checkers know that characters (2,3,4) are to be ignored,
> they'll only check characters in (1), ignoring the rest as indicated
> by the character (2) for which they don't associate any orthographic
> meaning.
>
> Sorters continue to work (character (2, 3, 4) can be given a non null
> weight only in higher collation levels).
>
> Le 12 mars 2012 00:44, Philippe Verdy <verdy_p_at_wanadoo.fr> a écrit :
>> Also I do think that this proposal would avoid havng to encode many
>> new "precomposed" diacritics made of a diacritic letter and a
>> diacritic applying to it. We would just encode them using such
>> separator first, before the encoded diacritic letter, and the standard
>> combining diacritics.
>>
>> With this tool, immediately, we can cover all scripts at once, for all
>> languages and all usages.
>>
>> Le 12 mars 2012 00:36, Philippe Verdy <verdy_p_at_wanadoo.fr> a écrit :
>>> In other words, that circumflex is an epigraphic notation. This means
>>> three distinct levels of analysis of the text: one for Chi, one for
>>> the small letter above it noting something about the Chi, and another
>>> for the circumflex noting something about the Chi itself.
>>>
>>> This causes a major problem : how to separate cleanly those levels of
>>> representation when diacritics are NOT supposed to modify a letter
>>> orthographically ?
>>>
>>> 1) use an upper layer protocol (this is the position constantly
>>> adopted, but it has its limits).
>>>
>>> 2) use a special invisible combining character used as prefixes (with
>>> combining class 0 to avoid reorderings and other ambiguous combined
>>> forms caused ny normalizations) to separate and provide an unspecified
>>> additional semantic to the standard diacritics encoded after them.
>>>
>>> 3) Or possibly several of such special invisible combining characters
>>> in a coherent set (we could have 16 of them, encoded at once in one
>>> column in the special plane, each one with a numeric property which
>>> does not designate how it will be used in actual texts, in a way
>>> similar to the multiple variant selectors or multiple PUAs that are
>>> not very well fitted for combining characters), it if is needed to
>>> make semantic distinctions between these multiple (but optional)
>>> epigraphic levels.
>>>
>>>
>>> Le 11 mars 2012 14:06, Michael Everson <everson_at_evertype.com> a écrit :
>>>> On 11 Mar 2012, at 12:05, Denis Jacquerye wrote:
>>>>
>>>>> Stacked letters are also found in some Greek manuscripts.
>>>>> See the page http://www.archive.org/stream/revuearchologi27pariuoft#page/156/mode/1up
>>>>> with some examples: Nu, omicron, omicron and Greek circumflex (tilde),
>>>>> chi and Greek circumflex.
>>>>> Would these also have to be represented by combining characters?
>>>>
>>>> Yes, but in this case I don't think that circumflex is part of the superscript letter per se. It's a base letter with a combining letter, and the whole thing has a mark over it to show it's an abbreviation. (There is obviously no chi-circumflex in Greek orthography.)
>>>>
>>>> Michael Everson * http://www.evertype.com/
>>>>
>>>>
>>>>
Received on Sun Mar 11 2012 - 19:18:26 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 11 2012 - 19:18:26 CDT