Re: MSKLC restrictions (was: Ways to show Unicode contents on Windows?)

From: Philippe Verdy <>
Date: Wed, 31 Jul 2013 22:49:07 +0200

NFC versus NFD is not the problem here. One will still want to produce the
correct sequence which will be canonically equivalent (possibly not even in
any normalized form but still canonically equivalent).
You may still want to be able to use dead keys, instead of typing separate
keys for each combining diacritic, and finally get the cluster encoded in
the correct order instead of one of many possible canically equivalent
orders with characters precobined or not.
May be the text editor could provide help by renormalizing the input
produced by the keyboard.
But the solution with dead keys requires encoding internal dead keys to
compose another dead key.

However, the MSKLC does not allow the target of a (dead-key + another key)
to produce another dead key, it can only produce an ouput string
(containing at most four 16-bit code units). So dead keys cannot be chained.

Vietnamese keyboards work because in fact if there may be two diacritics
above a base letter, in this case one of the diacritic is attached to a
base letter, and can be typed using a single keystroke to generate the
precomposed character, so you can still type a dead key before to generate
the base characer with the two diacritics. In the Egacy Windows code page
for Vietnamese this did not produce an output in NFC form but two
characters: (1) the base letter precombined with the attached diacritic,
(2) the additional combining character.

For more complex outputs or input sequences, a full IME is still required
(e.g. for Chinese or Japanese keyboards), possibly with an additional
display UI (for selecting in choice lists).
An IME is also requried when you have limited keyboards (e.g. on
smartphones with small displays there's not a lot of space for many keys,
the IME uses a displayed selector UI for guessing not just character
clusters but as well full words or the next probable words,)
The IME may also translate some abbreviations according to user preferences
and could learn from the user's past choices to enhance the input, or could
detect the language used; the IME may also be fed with information from the
application using it (e.g. by reading from the surrounding text before or
after the current insertion point, or by the application providing data
about the expected input format. Finally the IME may also nput something
else than just plain text, e.g. inputing an image by giving to the
application some virtual keys attached with a queryable API to get the
extra data pointing to the selected image (selected from a gallery, or from
the clipboard, or created internally by calling a camera capture
application, or a manually drawn personal signature from a touch device,
possibly performing OCR or rcognizing some personal gestures for various
kind of text or data input).
The IME will then expose an API to the application which will select what
is convenient for itself. An IME is in fact a full application, except that
it is normally invisible when it runs, and will pop up on top of the
application when necessary (for example when the application places the
input caret on an input field, the application possibly providing other
contextual information to the IME). The IME will then output not just text,
but possibly some other virtual edit functions (like moving the insertion
point to the previous or next cluster, or changing two spaces in a full
stop followed by space as soon as you input the second space...).
Received on Wed Jul 31 2013 - 15:54:08 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 31 2013 - 15:54:09 CDT