Re: Why people still want to encode precomposed letters

From: Doug Ewell (
Date: Tue Nov 18 2008 - 21:00:45 CST

  • Next message: Andrew Cunningham: "Re: Why people still want to encode precomposed letters"

    Andrew Cunningham wrote:

    > Actually, insisting on precomposed characters may not make things ea
    > sier for some languages. Just thinking of the practicalities involved.
    > Take Vietnamese as an example, each combination of vowel and tone mark
    > exists as a single precomposed character in Unicode.
    > Then look at Microsoft's keyboard layout for Vietnamese. Due to the
    > design parameters of keyboard layouts on Windows, Microsoft used
    > combining diacritics for tone marks.

    On modern systems, there is no necessary correlation between the
    decision to encode diacriticized letters in composed or decomposed form,
    and the number of keystrokes required to type them on the keyboard. A
    single key can generate two or more characters, which is common in
    keyboards for Indic languages, or a single character can require two
    keystrokes, which is common on just about every keyboard in Europe.

    Microsoft used combining diacritics for tone marks in their Code Page
    1258 because they only had 256 code points to work with, not because of
    the number of available keys.

    I built a customized keyboard using Microsoft Keyboard Layout Creator
    (MSKLC) [1] that uses dead keys for most diacritics, to maximize the
    number of possible characters. But because I wanted to be able to type
    Vietnamese, and dead-key sequences on Windows keyboards can include no
    more than one dead key (AFAIK), I gave precomposed letters like
    o-with-circumflex (ô) their own key. Now I can type U+1ED1 (ố) with two
    keystrokes, which is neither one (the number of Unicode characters for
    this letter in NFC) nor three (NFD).

    I'm speaking about all of this from a Windows perspective, but I'm sure
    it is equally true for Mac, Linux, and other modern systems.


    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Tue Nov 18 2008 - 21:02:55 CST