Re: Letters for Indic transliteration

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Jul 20 2005 - 03:18:26 CDT

Next message: Andreas Prilop: "Re: Letters for Indic transliteration"

Previous message: Raymond Mercier: "set unicode vacation"
In reply to: Andreas Prilop: "Re: Letters for Indic transliteration"
Next in thread: Michael Everson: "Re: Letters for Indic transliteration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Tuesday, July 19th, 2005 13:58Z Andreas Prilop wrote:

> My concern is not about conventions but that you need
>
> - "R with ring below" and "R with dot below" *at the same time*,
> - "L with ring below" and "L with dot below" *at the same time*.

Sure. And as you correctly pointed out in the first place, there are already
available the combining diacritic to do this.

The precomposed are a relic from the past; sometimes they are convenient
(for example, it allows me to communicate using e-mail with a lot of my
friends which still are not fully Unicode enabled; also see below). But
sometimes they are not perfect.

In your case, you basically have two options:
- one is to go on with the precomposed and some lack of
     precision and perfection;
- or eat the whole bullet and get back for combining characters
     (which may make your application more complex, or need quite
     a bit more resources)

> The current situation is unfortunate:

Yes. Similarly is many languages written in Latin script, where letter C
(and G) means either a guttural or a palatal/cerebral/apical.

> Otherwise there will be non-ending confusion since
> many people use the current "R, L with dot below" instead of
> "R, L" followed by U+0325 "ring below".

The confusion will not end if Unicode adds new codepoints.

A lot of people (I would say the majority) only use the dot-below convention
and many (majority?) do not even know about ring-below; I do not expect this
to change significantly in the x-year term before the new codepoints should
eventually be officialised (I know it won't happen). So when the new
codepoints would exist, someone using U+1E5A could equally well mean a vowel
or a consonant, even if the proper use would ask for consonant as the only
possibility.

For a lookalike, look after French œ, U+0153, in words like cœur, œil, or
bœuf (heart, eye, beef; not precisely uncommon words) : not many people know
this codepoint exist, so the majority of the material keyed in today still
use the unligated form, oe, as in coeur, oeil, or boeuf (Google stats for
"cœur": 8.67M including those using "coeur", versus "coeur": /only/ 8.66M;
so use of "cœur" is about 1%...)

Antoine

Next message: Andreas Prilop: "Re: Letters for Indic transliteration"
Previous message: Raymond Mercier: "set unicode vacation"
In reply to: Andreas Prilop: "Re: Letters for Indic transliteration"
Next in thread: Michael Everson: "Re: Letters for Indic transliteration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jul 20 2005 - 10:27:51 CDT