Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Michael Everson <>
Date: Wed, 19 Jun 2013 14:36:16 +0100

On 19 Jun 2013, at 13:41, Denis Jacquerye <> wrote:

>> The same way one would rationalize using precomposed ãẽĩñõũỹ (aeinouy with tilde) but a necessarily de-composed g̃ (g with tilde) in Guaraní.
> This is wrong: ãẽĩñõũỹ normalize to use U+0303 in NFD, so they canonically use the same tilde as g̃.

Only in text which has been decomposed. Not all text gets decomposed.

> The 4 additional non decomposable characters with Marshallese with cedilla would not normalize to use the same cedilla as the others Marshallese characters with cedilla. The would no canonically use the
> same cedilla.

That's correct. Why? Because the Latvians are using L+cedilla and N+cedilla, and we cannot change that. Mistakes were made in 1990 and 1991. We have to live with those mistakes. (Some mistakes we don't have t live with -- but this one we do.)

>>> It would require less new characters to be encoded and would make it easier to support in fonts (adding 1 instead of 4).
>> No! Because if you added a single new character you'd have to make sure you had good glyph placement with LlMmNnOo which is eight glyphs.
> The best practice would require to add diacritical mark placement whenever necessary if not on all possible base character, M/m and O/o would still need either way, L/l and N/n would need it for other
> combining diacritics either way.

In my fonts I do this with pre-composed glyphs. I don't really know how easy or how reliable attachment points for floating diacritics is.

> A modern font already needs to be able to correctly place combining diacritics, including cedilla or ogonek.
> Navajo and other languages need other placement of ogonek than that of European languages.

I'd like to see evidence for that assertion. When I was in Lithuania last week I saw many examples of badly-placed ogonek, particularly on capital letters. What is your assertion about Navajo?

> This does not mean it is justified to encode single precomposed Navajo ogonek characters.

But that isn't the issue here.

> The placement of the cedilla is not semantically different, m̧ with the cedilla on the left has the same meaning as if the cedilla were centered or on the right, even if just one of the two is correct in
> some contexts like in Marshallese.

That isn't the issue here. The issue is that Marshallese uses a cedilla shape, but that despite the decomposition, Latvian letters with cedilla are drawn with commas below. So Marshallese gets the wrong glyph for L+cedilla and N+cedilla.

> This does not mean it is justified to encode m with left cedilla, m with centered cedilla or m with right cedilla.
> An additional single combining diacritics would behave the same way.

This isn't about diacritic positioning. It's about glyph shape.

>>> ALA-LC romanizations use cedilla with r as they do under c or s.
>> Does ŗ contrast with r̦ in ALA-LC romanization?
> The same way Marshallese has cedilla letters contrasting with comma below letters.
> The only correct form is with cedilla and it doesn't use comma below.

All right: where does ŗ contrast with r̦ in ALA-LC romanization?

>> Do you think that encoding one new COMBINING MARSHALLESE CEDILLA will not cause problems both with existing COMBINING COMMA BELOW and COMBINING CEDILLA?
> About the confusability, it is too late. Comma below, cedilla, palatalized hook below, ring half ring below and probably others are already confusable. Adding another will increase confusability but not
> to a relevant degree.

But there's nothing wrong with the current representation of Marshallese M̧ m̧ or O̧ o̧. Those are fine.

> Having 4 single characters will not make anything less confusable (using U+0327 with M/m and O/o but not with L/l and N/n is confusing)

That's a different kind of confusion. It might be counter-intuitive to use COMBINING CEDILLA with M/m and O/o and pre-composed characters with L/l and N/n, but

> although it is a solution is does not solve the general problem of cedilla.

We can't really "solve" the general problem.

> If we don't want additional confusing characters maybe we should have CGJ, ZWJ or ZWNJ + combining cedilla (or any other similar sequence) to optionally differentiate the types of cedillas in Latvian,
> Livonian, Marshallese and romanizations.

We can't really use these with combining diacritical marks.

> The issue of cedilla can easily be solved at a higher level, font technologies like OpenType can easily display glyphs in Latvian or Livonia and different glyphs for Marshallese.

Only in environments which permit language tagging. I'd like Marshallese children to be able to write their language in filenames.

Michael Everson *
Received on Wed Jun 19 2013 - 08:38:28 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 19 2013 - 08:38:29 CDT