Re: Letters for Indic transliteration

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Jul 20 2005 - 03:18:26 CDT

  • Next message: Andreas Prilop: "Re: Letters for Indic transliteration"

    On Tuesday, July 19th, 2005 13:58Z Andreas Prilop wrote:

    > My concern is not about conventions but that you need
    >
    > - "R with ring below" and "R with dot below" *at the same time*,
    > - "L with ring below" and "L with dot below" *at the same time*.

    Sure. And as you correctly pointed out in the first place, there are already
    available the combining diacritic to do this.

    The precomposed are a relic from the past; sometimes they are convenient
    (for example, it allows me to communicate using e-mail with a lot of my
    friends which still are not fully Unicode enabled; also see below). But
    sometimes they are not perfect.

    In your case, you basically have two options:
     - one is to go on with the precomposed and some lack of
         precision and perfection;
     - or eat the whole bullet and get back for combining characters
         (which may make your application more complex, or need quite
         a bit more resources)

    > The current situation is unfortunate:

    Yes. Similarly is many languages written in Latin script, where letter C
    (and G) means either a guttural or a palatal/cerebral/apical.

    > Otherwise there will be non-ending confusion since
    > many people use the current "R, L with dot below" instead of
    > "R, L" followed by U+0325 "ring below".

    The confusion will not end if Unicode adds new codepoints.

    A lot of people (I would say the majority) only use the dot-below convention
    and many (majority?) do not even know about ring-below; I do not expect this
    to change significantly in the x-year term before the new codepoints should
    eventually be officialised (I know it won't happen). So when the new
    codepoints would exist, someone using U+1E5A could equally well mean a vowel
    or a consonant, even if the proper use would ask for consonant as the only
    possibility.

    For a lookalike, look after French , U+0153, in words like cur, il, or
    buf (heart, eye, beef; not precisely uncommon words) : not many people know
    this codepoint exist, so the majority of the material keyed in today still
    use the unligated form, oe, as in coeur, oeil, or boeuf (Google stats for
    "cur": 8.67M including those using "coeur", versus "coeur": /only/ 8.66M;
    so use of "cur" is about 1%...)

    Antoine



    This archive was generated by hypermail 2.1.5 : Wed Jul 20 2005 - 10:27:51 CDT