From: Doug Ewell (email@example.com)
Date: Tue Nov 18 2008 - 21:00:45 CST
Andrew Cunningham wrote:
> Actually, insisting on precomposed characters may not make things ea
> sier for some languages. Just thinking of the practicalities involved.
> Take Vietnamese as an example, each combination of vowel and tone mark
> exists as a single precomposed character in Unicode.
> Then look at Microsoft's keyboard layout for Vietnamese. Due to the
> design parameters of keyboard layouts on Windows, Microsoft used
> combining diacritics for tone marks.
On modern systems, there is no necessary correlation between the
decision to encode diacriticized letters in composed or decomposed form,
and the number of keystrokes required to type them on the keyboard. A
single key can generate two or more characters, which is common in
keyboards for Indic languages, or a single character can require two
keystrokes, which is common on just about every keyboard in Europe.
Microsoft used combining diacritics for tone marks in their Code Page
1258 because they only had 256 code points to work with, not because of
the number of available keys.
I built a customized keyboard using Microsoft Keyboard Layout Creator
(MSKLC)  that uses dead keys for most diacritics, to maximize the
number of possible characters. But because I wanted to be able to type
Vietnamese, and dead-key sequences on Windows keyboards can include no
more than one dead key (AFAIK), I gave precomposed letters like
o-with-circumflex (ô) their own key. Now I can type U+1ED1 (ố) with two
keystrokes, which is neither one (the number of Unicode characters for
this letter in NFC) nor three (NFD).
I'm speaking about all of this from a Windows perspective, but I'm sure
it is equally true for Mac, Linux, and other modern systems.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Tue Nov 18 2008 - 21:02:55 CST