From: Christoph Päper (email@example.com)
Date: Thu May 19 2011 - 05:35:29 CDT
> Text editing and processing with combining marks is not "very difficult and erroneous."
The biggest problem with precomposed versus combined characters in text editors and word processors is that they are in fact treated differently.
Some accented letters are found on keys of their own on relevant national keyboard variants.
Others can easily be produced by a combination of base letter and dead-key diacritic mark, although they have to be pressed in a different order than they are coded.
Finally, some accented letters need a special kind of assisstive input system, often visual character maps (though these are often ordered in a not too helpful way, i.e. by Unicode position).
It might be useful if computers offered their users a standard way to access and change diacritics on base letters, no matter how hey were enterd in the first place or how htey are encoded. For instance, I could write “resume”, hit the one special key, e.g. ‘^’, and get an inline drop-down list to change the ‘e’ to ‘é’ (because that is a variant of the word in this instance that was found in the dictionary) or ‘è’, ‘ê’ etc. (shown in an standard fix order by frequency / probability).
The backspace (leftwards delete key) and (rightwards) delete keys should always delete one visual entity perceived as a single character by users, i.e. a combination of base letter and accent(s). The software could offer a key combination to free selected or adjacent base letters of all their diacritics, though, e.g. [Ctrl+Shift+Del/BS].
I believe it would help if input immediately was transformed to and text was saved in NFD, because this would make the need for uniform treatment more obvious.
It would be cool if there was an ASCII-compatible encoding with variable length like UTF-8 that supported only NFD (or NFKD) and was optimized for a small storage footprint, e.g. from U+00C0–017F only a handful would have to be coded separately. Sadly, though, it is unrealistic to have a unique single byte code for each combining diacritic, because there are so many of them: even just ranges U+0300–036F and U+1DC0–1DFF are 176 positions together, although some are still unassigned; that is more than you can encode with 7 bits or less.
> The one use case that Plamen mentioned (a user manually deleting a base letter) is easily trained.
Changing people is harder than changing software, in general.
This archive was generated by hypermail 2.1.5 : Thu May 19 2011 - 05:44:08 CDT