Re: Grapheme cluster boundaries and left-side spacing dependent vowels

From: Jungshik Shin (jshin@mailaps.org)
Date: Sat Apr 26 2003 - 01:48:11 EDT

  • Next message: Jungshik Shin: "Re: alternative names for letterlike symbols(was..Re: Release of Unicode 4.0)"

    > Peter Constable wrote:
    >
    > > Jungshik Shin wrote on 04/21/2003 09:27:04 PM:
    > >
    > > > I think two cases are distinct. In bidi text, bouncing back and forth
    > > > is across grapheme boundaries while in what James described, it's
    > > > within a single grapheme.
    > >
    > > Well, wasn't the point of James' comments: to determine whether the Indic
    > > sequences *should* be considered a grapheme?

      Let's forget about graphemes. I'm not saying bouncing back
    and forth in Indic script (input) is bad or should be prohibited.
    Sometimes it's desired and necessary because like other scripts that
    are alphabetic as well as syllabic [1], Indic scripts have multiple
    layers of 'unitness' and end-users' expectation of what constitutes a
    unit can depend on several factors. Therefore, my answer as to whether
    it's desirable or not to allow bouncing back and forth in the situation
    James described would be it depends. My answer wouldn't change even if
    the UTC had stipulated (normatively) that a certain sequence of Indic
    letters constitute grapheme. Because as Marco and I discussed last
    year(?), there's no single answer as to how many Unicode characters to
    delete/move on delete/backspace/left cursor/right cursor key for Indic
    scripts and Hangul (and perhaps other scripts as well. For example,
    some people may want backspace key to delete just a diacritic mark
    instead of base+diacritics.). Therefore, Korean input methods usually
    offer a user-configurable option to controll the behavior of
    backspace/delete/cursor key at least at the pre-edit stage.

      BIDI case is different from Indic scripts because
    most people (at least, there's less dependency on personal preference,
    application need and so forth) would agree that moving back and forth
    is what to expect when moving across LTR and RTL boundaries
    on (almost) all occasions.

      Jungshik

    [1] Korean cell/mobile phones have three keys for vowels, dot, horizontal
    stroke (at 2, 1, 3), and vertical stroke with 14 consonants allocated on
    7 keys at [4..9] and 0 (two consonants a key). In this layout, vowels are
    constructed from these three elements (天, 地, 人 : at least two layers
    below than precomposed syllables). As far as vowels are concerned, the
    layout takes advantage of 'featural' aspect of the script. It's amazing
    how fast kids can type with it when sending out text messages to their
    friends. Their fingers are literally flying.... BTW, does anybody know
    what Indian kbd layouts for cellphones are like?



    This archive was generated by hypermail 2.1.5 : Sat Apr 26 2003 - 02:29:29 EDT