Re: Vietnamese (Re: Unicode, SMS, PDA/cellphones)

From: Mike Ayers (
Date: Mon Jun 05 2006 - 16:43:03 CDT

  • Next message: Doug Ewell: "Re: UTF-8 can be used for more than it is given credit"

    Philippe Verdy wrote:
    > From: "Doug Ewell" <>
    >>>Why then would it be more complicate to compose text like this,
    >>>instead of using VIQR that would require composing mostly the same
    >>>number of symbols (and sometimes more...)?
    >>Vietnamese composition becomes tricky when working with fully decomposed
    >>vowels, so that αΊ­ decomposes to "U+0061 plus U+0323 plus U+0302." Not
    >>all rendering systems (even today) can handle placing two or more
    >>diacritical marks on a single base letter.

            I think you two may be tripping over different meanings of "composed" here.

    >>Additionally, this decomposition ("a" plus dot-below plus circumflex)
    >>doesn't match the way Vietnamese view this letter (("a" plus circumflex)
    >>plus dot-below). This is not a Unicode problem, but entering the
    >>diacritics is the language-appropriate order might be a problem if a
    >>rendering engine insists on canonical order.

            Only if end users are expected to enter Unicode sequences, which they
    should never be. The IME is responsible for Unicode normalization, so
    this should not be a problem.

    > That was not really addressing directly my question. The only thing that
    > does not seem natural for Vietnamese is the encoding order of diacritics
    > for the NFD decomposed letters, because it places some tone marks before
    > the decomposed vowel modifier. But an encoding that does not attempt to
    > decompose the 6 base vowelsthat Vietnamese considers as an unbreakable
    > unit, and use the 12 vowels plus combining diacritics only for the tone
    > marks will work fine and will seem quite natural for users.

            This is an unimportant distinction. VIQR decomposes vowels, this has
    not caused any difficulty with adoption.

    > So a PDA or cellphone where the text is input this way is not a bad
    > option, and it seems easy to place the 6 extra base vowels on the 9-keys
    > of a cellphone keyboard without lots of extra keystrokes to select the
    > appropriate character. Then allowing the users to select additional tone
    > marks if they wish;

            Nine keys is probably too few, but a VIQR based method could be very
    productive and probably fit within twelve keys.

    > In addition, of course, a dictionnary lookup assistant will help composing
    > most common words, with their correct accents and tone marks.

            Probably not. Vietnamese is so phonetically dense that the keystrokes
    to navigate completion would, in most cases, outnumber the keystrokes to
    complete the word.

            For the record, Vietnamese without tone marks is unintelligible unless
    a severely reduced vocabulary is agreed upon.



    This archive was generated by hypermail 2.1.5 : Mon Jun 05 2006 - 16:55:38 CDT