Re: Combining latin small letters with diacritics

From: Philippe Verdy <>
Date: Mon, 12 Mar 2012 00:56:31 +0100

One example: say you want to encode an epigraphic C with CEDILLA
appearing as a letter above another one, you would encode :

- (1) the orthographic base letter (with its standard diacritics,
including CGJ if needed)
- (2) the new special combining character with combining class 0 that I propose.
- (3) the existing combiing letter C
- (4) the existing combiing CEDILLA (or other existing diacritics,
including CGJ if needed to avoid reorderings by normalizers).

Renderers have hints given by the character (2) that they must not
reorder/mix/compose randomly the characters between parts (1) and (3,
4). But they also have the hint that they can precompose safely the
characters in (3, 4) without breaking anything, And they don't have to
represent the character (2) itself (they could do it, still, using
some other layout mechanisms).

Semantic analysers know how to intepret characters in (3, 4) together,
with their semantic level associated by them for the special character

Ortographic checkers know that characters (2,3,4) are to be ignored,
they'll only check characters in (1), ignoring the rest as indicated
by the character (2) for which they don't associate any orthographic

Sorters continue to work (character (2, 3, 4) can be given a non null
weight only in higher collation levels).

Le 12 mars 2012 00:44, Philippe Verdy <> a écrit :
> Also I do think that this proposal would avoid havng to encode many
> new "precomposed" diacritics made of a diacritic letter and a
> diacritic applying to it. We would just encode them using such
> separator first, before the encoded diacritic letter, and the standard
> combining diacritics.
> With this tool, immediately, we can cover all scripts at once, for all
> languages and all usages.
> Le 12 mars 2012 00:36, Philippe Verdy <> a écrit :
>> In other words, that circumflex is an epigraphic notation. This means
>> three distinct levels of analysis of the text: one for Chi, one for
>> the small letter above it noting something about the Chi, and another
>> for the circumflex noting something about the Chi itself.
>> This causes a major problem : how to separate cleanly those levels of
>> representation when diacritics are NOT supposed to modify a letter
>> orthographically ?
>> 1) use an upper layer protocol (this is the position constantly
>> adopted, but it has its limits).
>> 2) use a special invisible combining character used as prefixes (with
>> combining class 0 to avoid reorderings and other ambiguous combined
>> forms caused ny normalizations) to separate and provide an unspecified
>> additional semantic to the standard diacritics encoded after them.
>> 3) Or possibly several of such special invisible combining characters
>> in a coherent set (we could have 16 of them, encoded at once in one
>> column in the special plane, each one with a numeric property which
>> does not designate how it will be used in actual texts, in a way
>> similar to the multiple variant selectors or multiple PUAs that are
>> not very well fitted for combining characters), it if is needed to
>> make semantic distinctions between these multiple (but optional)
>> epigraphic levels.
>> Le 11 mars 2012 14:06, Michael Everson <> a écrit :
>>> On 11 Mar 2012, at 12:05, Denis Jacquerye wrote:
>>>> Stacked letters are also found in some Greek manuscripts.
>>>> See the page
>>>> with some examples: Nu, omicron, omicron and Greek circumflex (tilde),
>>>> chi and Greek circumflex.
>>>> Would these also have to be represented by combining characters?
>>> Yes, but in this case I don't think that circumflex is part of the superscript letter per se. It's a base letter with a combining letter, and the whole thing has a mark over it to show it's an abbreviation. (There is obviously no chi-circumflex in Greek orthography.)
>>> Michael Everson *
Received on Sun Mar 11 2012 - 18:58:13 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 11 2012 - 18:58:14 CDT