Re: Combining latin small letters with diacritics

From: Jeremie Hornus <>
Date: Tue, 27 Mar 2012 12:43:16 +0200

On 26 Mar 2012, at 13:35, Denis Jacquerye wrote:

> So far the linguistic atlases I have seen extensively use this
> combining letter mechanism, with diacritics changing the meaning of
> the combining letter or of the base letter.
> There are a whole lot of notations that could simply be base combining
> letter + combining diacritics, but if you consider their meaning,
> would have to be encoded a their own combining letter with diacritics
> on the model of combining a-umlaut, etc... This is not coherent with
> the decomposition model.
> Having some of the combining letters with diacritics encoded as single
> characters and other combining letters with diacritics encoded as
> combining letters with separate combining diacritics based on what
> meaning they have is as erroneous as having new precomposed characters
> because of their meaning. Would we encode ɛ̱ because it is a specific
> meaning in some language? Obviously no.

Encoding is supposed to be "language-independent" but that's theoretical,
in practice it cannot be independent to _some_ language.
And the "business-language" for encoding seems to have been english so far,
at least as far as I understand Unicode.

> So why are we encoding ä when
> it is a combining diacritic but not ẽ?

That is a reminiscence of the history of type-encoding.

> The fact that combining c-cedilla is a precedent doesn't make it any saner.
> For example a-breve is a separate letter in Romanian, and it is as
> well in the Atlasul linguistic romîn serie noua, just like the
> a-umlaut is in German dialectology. But in other dialectology works
> such as Atlas linguistique de la France, a-breve is just a breve a,
> not a different letter.
> You'll then end up with combining a-breve as a single diacritic for
> Romanian but combining a and combining breve for other languages. Yet
> the non-diacritic forms, i.e. regular letters would be represented by
> the same character sequence in Romanian or other languages, NFC ă or
> NFD ă.
> The same could be said for a-umlaut/diaeresis depending on how people
> are using it.
> Denis Moyogo Jacquerye
> <combining_a-breve-1-ALRsn.jpg><combining_a-breve-2-ALRsn.jpg>
Received on Tue Mar 27 2012 - 05:48:24 CDT

This archive was generated by hypermail 2.2.0 : Tue Mar 27 2012 - 05:48:25 CDT