Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Denis Jacquerye <moyogo_at_gmail.com>
Date: Wed, 19 Jun 2013 13:41:59 +0100

On Wed, Jun 19, 2013 at 9:12 AM, Michael Everson <everson_at_evertype.com> wrote:
> On 19 Jun 2013, at 07:54, Denis Jacquerye <moyogo_at_gmail.com> wrote:
> [...]
>> How would one rationalize using one diacritic U+0327 with M/m and O/o but not with L/l and N/n in Marshallese?
>
> The same way one would rationalize using precomposed ãẽĩñõũỹ (aeinouy with tilde) but a necessarily de-composed g̃ (g with tilde) in Guaraní.

This is wrong: ãẽĩñõũỹ normalize to use U+0303 in NFD, so they
canonically use the same tilde as g̃.
The 4 additional non decomposable characters with Marshallese with
cedilla would not normalize to use the same cedilla as the others
Marshallese characters with cedilla. The would no canonically use the
same cedilla.

> [...]
>> It would require less new characters to be encoded and would make it easier to support in fonts (adding 1 instead of 4).
>
> No! Because if you added a single new character you'd have to make sure you had good glyph placement with LlMmNnOo which is eight glyphs.

The best practice would require to add diacritical mark placement
whenever necessary if not on all possible base character, M/m and O/o
would still need either way, L/l and N/n would need it for other
combining diacritics either way.
A modern font already needs to be able to correctly place combining
diacritics, including cedilla or ogonek.
Navajo and other languages need other placement of ogonek than that of
European languages.
This does not mean it is justified to encode single precomposed Navajo
ogonek characters.
The placement of the cedilla is not semantically different, m̧ with
the cedilla on the left has the same meaning as if the cedilla were
centered or on the right, even if just one of the two is correct in
some contexts like in Marshallese.
This does not mean it is justified to encode m with left cedilla, m
with centered cedilla or m with right cedilla.
An additional single combining diacritics would behave the same way.

On Wed, Jun 19, 2013 at 9:49 AM, Michael Everson <everson_at_evertype.com> wrote:
> On 19 Jun 2013, at 09:04, Denis Jacquerye <moyogo_at_gmail.com> wrote:
>
>> Furthermore, the cedilla can also have a proper cedilla form as opposed to the Latvian or Livonian comma below form in transliteration systems.
>
> This has nothing to do with the Marshallese/Latvian conflict, though.
>
>> ALA-LC romanizations use cedilla with r as they do under c or s.
>
> Does ŗ contrast with r̦ in ALA-LC romanization?

The same way Marshallese has cedilla letters contrasting with comma
below letters.
The only correct form is with cedilla and it doesn't use comma below.

>> BGN/PCGN and UNGEGN romanizations use cedilla with d as they do under h, s, t or z.
>> DIN 1460-2 uses the cedilla under d, k, l, n as it does under c, h, s, t and z.
>
> If those things are a problem, then solving this problem for Marshallese simply does nothing about that problem. But it solves the problem for Marshallese.
>
>> If the 4 Marshallese cedilla characters are encoded as single characters, does this mean the d, k, l, r with proper cedilla in those romanizations would also have to be encoded as single characters?
>
> No; it doesn't have any implications for that data.
>
>> Encoding 1 combining diacritic character is more efficient than encoding 12 characters.
>
> Do you think that encoding one new COMBINING MARSHALLESE CEDILLA will not cause problems both with existing COMBINING COMMA BELOW and COMBINING CEDILLA?

About the confusability, it is too late. Comma below, cedilla,
palatalized hook below, ring half ring below and probably others are
already confusable. Adding another will increase confusability but not
to a relevant degree.
Having 4 single characters will not make anything less confusable
(using U+0327 with M/m and O/o but not with L/l and N/n is confusing)
although it is a solution is does not solve the general problem of
cedilla.
If we don't want additional confusing characters maybe we should have
CGJ, ZWJ or ZWNJ + combining cedilla (or any other similar sequence)
to optionally differentiate the types of cedillas in Latvian,
Livonian, Marshallese and romanizations.

The issue of cedilla can easily be solved at a higher level, font
technologies like OpenType can easily display glyphs in Latvian or
Livonia and different glyphs for Marshallese.

--
Denis Moyogo Jacquerye
African Network for Localisation http://www.africanlocalisation.net/
Nkótá ya Kongó míbalé --- http://info-langues-congo.1sd.org/
DejaVu fonts --- http://www.dejavu-fonts.org/
Received on Wed Jun 19 2013 - 07:45:25 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 19 2013 - 07:45:26 CDT