Re: Non-ascii string processing?

From: Jungshik Shin (
Date: Wed Oct 08 2003 - 22:55:59 CST

On Tue, 7 Oct 2003, Peter Kirk wrote:

> On 07/10/2003 04:35, Jill Ramonsky wrote:

> Anyway, DGCs are not always what you want to work with.

 Besides, DGCs are just for the default and are not the
absolute invariant atomic unit that can never be broken. In some
situations, delete operation and cursor movement should work at a level
different from that of the DGCs. The Unicode DGCs for Korean script
are syllables, but at least during the text input (_before_ a syllable is
'committed'), many Koreans want 'backspace' key to delete what she just
typed in - a Korean letter(jamo) instead of the whole syllable. It'd be
frustrating to have to type the whole sequence again just because one
makes a mistake in the last Unicode character to form a DGC (made of
several Unicode characters).

> I work a lot
> with pointed Hebrew texts. For most purposes (though not for calculating
> space taken up on a line) the entities I need to work with correspond to
> Unicode characters rather than DGCs, for I work separately with the base
> characters (mostly consonants), the vowel points and the accents. In
> some cases the match is not precise, but it is a lot more convenient
> for my work if I can access a string character by character, rather than
> UTF-8 byte by UTF-8 byte or DGC by DGC. And, by the way, I have real
> examples of DGCs in Hebrew consisting of six characters.

  I've got a question about the cursor movement and
selection in Hebrew text with such a grapheme (made up of 6 Unicode
characters). What would be ordinary users' expectation when delete,
backspace, and arrow keys(for cursor movement) are pressed around/in the
middle of that DGC? Do they expect backspace/delete/arrow keys to operate
_always_ at the DGC level or sometimes do they want them to work at the
Unicode character level (or its equivalent in their perception of Hebrew
'letters')? Exactly the same question can be asked of Indic scripts.
I've asked this before (discussed the issue with Marco a couple of years
ago), but I haven't heard back from native users of Indic scripts.


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST