Re: Combining latin small letters with diacritics

From: Denis Jacquerye <>
Date: Mon, 26 Mar 2012 13:35:18 +0200

So far the linguistic atlases I have seen extensively use this
combining letter mechanism, with diacritics changing the meaning of
the combining letter or of the base letter.

There are a whole lot of notations that could simply be base combining
letter + combining diacritics, but if you consider their meaning,
would have to be encoded a their own combining letter with diacritics
on the model of combining a-umlaut, etc... This is not coherent with
the decomposition model.

Having some of the combining letters with diacritics encoded as single
characters and other combining letters with diacritics encoded as
combining letters with separate combining diacritics based on what
meaning they have is as erroneous as having new precomposed characters
because of their meaning. Would we encode ɛ̱ because it is a specific
meaning in some language? Obviously no. So why are we encoding ä when
it is a combining diacritic but not ẽ?

The fact that combining c-cedilla is a precedent doesn't make it any saner.

For example a-breve is a separate letter in Romanian, and it is as
well in the Atlasul linguistic romîn serie noua, just like the
a-umlaut is in German dialectology. But in other dialectology works
such as Atlas linguistique de la France, a-breve is just a breve a,
not a different letter.
You'll then end up with combining a-breve as a single diacritic for
Romanian but combining a and combining breve for other languages. Yet
the non-diacritic forms, i.e. regular letters would be represented by
the same character sequence in Romanian or other languages, NFC ă or
NFD ă.
The same could be said for a-umlaut/diaeresis depending on how people
are using it.

Denis Moyogo Jacquerye

combining_a-breve-1-ALRsn.jpg combining_a-breve-2-ALRsn.jpg
Received on Mon Mar 26 2012 - 06:40:54 CDT

This archive was generated by hypermail 2.2.0 : Mon Mar 26 2012 - 06:40:56 CDT