Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Philippe Verdy <>
Date: Fri, 5 Jul 2013 10:07:33 +0200

All this discussion if going to nowhere.
What would be more decisive would the fact that these shapes for celillas
had constrasting uses in any language. As far as I can tell, this has not
been demonstrated (not even in Romanian).
So the proposal is to disunify characters that are already encoded, with
the addition of new confusables. Do we need that for any distinction in any
language ? Does Marshallese really care about the shape of its cedillla or
comma below when they are perceived as equivalent, or used interchangeably ?
For now we've not seen any assertion stating that the use of the "wrong"
prefered shape was an orthographic error.
The orthography is flexible enough to not care about such visual
differences, and that's why font styles can also be changed without chaing
the actual meaning of text.
That's why nobody ever complained that used a comma-like cedilla
for French. People don't care about this visual detail, it is just a
stylistic choice.
But people will care about stable orthographies if plan text searches or
matches don't find the text that it is supposed to find. That's why for
French we now need a collation rule that will equate all these shape
variants (when they are cont canoncally equivalent). But adding more visual
confusables will just impact French now, by forcing us to add an additional
collation rule to equate these non canonically equivalents. This done, we
will continue to ignore these differences in French (with a very minor
binary difference for searches).

But for exact matches (when the encoded text is used as an identifier, such
as filenames), we will still want to make sure that encoded strings are
canoncally equivalent. This won't be possible with the new proposed
characters (meaning that they won't be used in French. But for Marshallese,
these new recommended alternatives will create new difficulties, without
really solving the problem.

I don't think that language tagging is even necessary : a Marshaese user
that will want to see comma or cedillas as he wants for these characters
can just have its own personal preferences in his user profile, stating
that that he prefers reading text like in Marshallese, and will just
represent these encoded cedillas/comma below as expected for Marshallese,
even if the encoded text is not written in Marshallese. And I don't think
that any one will complain (except if that user cannot use a Marshallese
locale, because it is not supported by his software environment...
something that can change... in which case he will just see documents, web
sites and OS interfaces localized in another language, in which the visual
rendering of Marshallese will still be coherent for the context of another
language which has different visual preferences).

I think that this situation is similar for the visual representation of
sinograms, depending on user's locale preferences, if he's Japanese,
Korean, continental Chinese, Southern Chinese (in Hong Kong or Macau),
Singaporian, or lives in another country anywhere else in the world... The
situation can be solved by adding user preferences in his local environment
to select the prefered set of shapes when explicit language tagging of
documents is not applicable.

2013/7/5 Denis Jacquerye <>

> On Thu, Jul 4, 2013 at 12:07 PM, Michael Everson <>
> wrote:
> > On 4 Jul 2013, at 03:56, "Phillips, Addison" <> wrote:
> >
> >> I don't disagree with the potential need for changing the
> decomposition. That discussion seems clear and is only muddled by talking
> about variant, language sensitive rendering. That isn't the only
> consideration, right?
> >
> > No, Addison, we can't change the decomposition, That would invalidate
> all the data everywhere in Latvia.
> >
> >> I disagree that language tagging is not a valid means of getting
> language specific shaping (which could solve a specific problem). This is
> hardly confined to CJK or Latvian. Minority languages can, in fact, take
> advantage of it, within reason (documentation is a problem and it
> presupposes that glyph support is available). In fact, in some ways,
> language based glyph selection is possibly easier to achieve because the
> number of implementations is relatively small.
> >
> > The problem is in pretending that a cedilla and a comma below are
> equivalent because in some script fonts in France or Turkey routinely write
> some sort of undifferentiated tick for ç. :-)
> Sure they are not equivalent, but stop pretending it is only in some
> script fonts, the page has plenty of
> examples where it is not in script fonts. In some languages the
> cedilla can have a shape similar to that of a comma, it's a fact.
> Any native speaker will tell you the comma-like form and others are
> acceptable. Just look at or, both very popular
> newspapers use webfonts with non classic cedilla (Le Monde uses TheMix
> —even in print it uses TheAntiqua with their comma-like cedilla— and
> Zaman uses a custom font with an attached tick-like cedilla).
> This is not the majority but it is frequent enough.
> > As far as I can see the only solution is:
> >
> > Mandate that only the comma-below shape is appropriate for Ḑḑ Ģģ Ķķ Ļļ
> Ņņ Ŗŗ despite their decomposition to cedilla.
> > Encode a set of undecomposable Dd Gg Kk Ll Nn Rr with invariant cedilla
> for display of that glyph with those base letters.
> >
> > The only strangeness here is that D̦d̦ G̦g̦ K̦k̦ L̦l̦ N̦n̦ R̦r̦ with
> genuine combining comma below are confusable with the Latvian/Livonian
> letters, but that is already the case.
> >
> >> None of this addresses the problem of pain text representation or the
> potential need to represent what are apparently different characters with a
> single encoding. But if it is just presentation we're talking about... how
> does this differ from, for example, Serbian vs Russian?
> >
> > What, the italic lowercase т? That is really not comparable to this
> issue.
> >
> > Michael Everson *
> >
> >
> >
> --
> Denis Moyogo Jacquerye
> African Network for Localisation
> Nkótá ya Kongó míbalé ---
> DejaVu fonts ---
Received on Fri Jul 05 2013 - 03:10:35 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 05 2013 - 03:10:41 CDT