From: Mark Davis (firstname.lastname@example.org)
Date: Tue Apr 22 2003 - 15:55:31 EDT
To add on to what Ken has said, what UAX #29 does is define default grapheme
cluster boundaries. While these form a well-defined core which can be very
useful in language-independent processing, for particular languages a
tailored grapheme cluster may be more useful, consisting of one or more
default grapheme clusters. Examples of this are given in UAX #29.
(مرقص بن داود)
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
fax: (408) 256-0799
----- Original Message -----
From: "Kenneth Whistler" <email@example.com>
Cc: <firstname.lastname@example.org>; <email@example.com>
Sent: Tuesday, April 22, 2003 11:45
Subject: Re: Grapheme cluster boundaries and left-side spacing dependent
> Peter Constable wrote:
> > Jungshik Shin wrote on 04/21/2003 09:27:04 PM:
> > > I think two cases are distinct. In bidi text, bouncing back and
> > > is across grapheme boundaries while in what James described, it's
> > > within a single grapheme.
> > Well, wasn't the point of James' comments: to determine whether the
> > sequences *should* be considered a grapheme?
> It's up to implementations, applications, and graphologists to
> The UTC made a brief foray onto the unforgiving ground of trying
> to determine grapheme status and grapheme boundaries, but after
> wrestling with the issue of trying to define "unithood" inside
> Indic orthographic syllables, backed off again.
> UAX #29 now has a very streamlined definition of "default
> grapheme cluster boundaries" which basically amounts to
> trying to keep boundaries from falling within sequences of
> base letters + non-spacing marks or within sequences of
> jamos that constitute a Korean syllable. That's it.
> UAX #29 default grapheme cluster boundaries don't even attempt
> to specify whether Devanagari consonant conjuncts, or
> akshara's, or orthographic syllables, or Indic constructs involving
> vowels behaving as chunks of conjunct forms, or whatnot constitute
> graphemes. Such determinations are basically out-of-scope for
> Unicode, in my opinion.
This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 16:34:20 EDT