Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 19 Jun 2013 12:26:47 +0200

2013/6/19 Michael Everson <everson_at_evertype.com>

> On 19 Jun 2013, at 09:59, "Jörg Knappen" <jknappen_at_web.de> wrote:
>
> > Somehow, the compromise solution found at the ad hoc meeting sounds
> fishy, because the is no such thing as
> > LATIN CAPITAL LETTER MARSHALLESE L or LATIN SMALL LETTER MARSHALLESE N
> (to be equipped with a cedilla).
> >
> > It is not the base letter but the diacritic which makes the difference,
> hence names like
> >
> > LATIN CAPITAL LETTER L WITH PROPER CEDILLA (marshallese)
> >
> > would sound better and more clear.
>
> The use of the name MARSHALLESE L and MARSHALLESE N serve to help prevent
> the mis-use of these characters.
>

Do you mean that it is supposed to prevent their use in Latvian/Levonian ?

This will happen anyway, simply because they are precomposed, and users
will still mix them, or because some styled fonts will look better for some
authors, and will have a cute presentation for these characters (notably n
fonts with variable stroke widths), than with a badly styled triangular
comma (or basic thin rectangle looking like an accent) in some fonts with
Swiss style.

Also this creates an initial restriction for correct use in other languages
or contexts.

For me the proposal is just a way to fix the presentation of the existing
COMBINING CEDILLA which has three major forms (chosen depending on the base
Latin letter and its capitalisation), including the form that looks like
COMBINING COMMA BELOW (in Latvian for L/l and N/n, and in Romanian for S/s
where similar confusion still occurs in NFC or NFD indifferently, long
after the comma below was encoded distinctly).

For languages that consider that this cariation of glyphs for COMBINING
CEDILLA is unacceptable, we should better encode its specific form (like we
did for COMBINING COMMA BELOW) so we'll have COMBINING CEDILLA ATTACHED
BELOW (and at the same time you can encode the 4 precomposed letters you
proposed, with their canonical decomposition using the new diacritic).

Legacy usages will persist where existing precomposed letters already are
decomposed with COMBINING CEDILLA. Notes added to these characters as well
as the representative glyph can suggest what is the expected form between
the 3.

And for the 4 proposed characters, you can directly drop the "MARSHALLESE"
word : the encoded canonical decomposition using the new diacritic will
already explcitly say that only one form is acceptable (for the other
possible forms, use the base letter followed by COMBINING CEDILLA, or more
precisely by COMBINING COMMA BELOW or COMBINING COMMA ABOVE)

I other words the encoding as well could be:

* COMBINING CEDILLA ATTACHED BELOW ; Mn ; <no decomposition>

* LATIN CAPITAL LETTER L WITH CEDILLA ATTACHED BELOW ; Lu ; <LATIN CAPITAL
LETTER L, COMBINING CEDILLA ATTACHED BELOW>

* LATIN SMALL LETTER L WITH CEDILLA ATTACHED BELOW ; Ll ; <LATIN SMALL
LETTER L, COMBINING CEDILLA ATTACHED BELOW>

* LATIN CAPITAL LETTER N WITH CEDILLA ATTACHED BELOW ; Lu ; <LATIN CAPITAL
LETTER N, COMBINING CEDILLA ATTACHED BELOW>

* LATIN SMALL LETTER N WITH CEDILLA ATTACHED BELOW ; Ll ; <LATIN SMALL
LETTER N, COMBINING CEDILLA ATTACHED BELOW>

(we still need a precision in these precomposed letters, due to the
pre-existing letters with legacy presentations looking like comma below,
that are also decomposable, but differently)

And may be we could map a few other precomposed letters at the same time
**without requiring** existing languages to use them (but also **without
restricting** them to do so, if their usage changes or if new distinctions
are needed when they will borrow words like toponyms from other languages,
keeping their distinctions). E.g: :

* LATIN CAPITAL LETTER C WITH CEDILLA ATTACHED BELOW ; Lu ; <LATIN CAPITAL
LETTER C, COMBINING CEDILLA ATTACHED BELOW>

* LATIN SMALL LETTER C WITH CEDILLA ATTACHED BELOW ; Ll ; <LATIN SMALL
LETTER C, COMBINING CEDILLA ATTACHED BELOW>

(these are still not needed for use in French or Portuguese but they are
possible if ever there's a new development where forms with comma below
will coexist, which are already encoded explicitly in decomposed form, and
may already be used in fonts currently intended for French or Portuguese,
where the comma below is also acceptable **today** without distinction).
Received on Wed Jun 19 2013 - 05:30:42 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 19 2013 - 05:30:58 CDT