On 11/26/2015 3:08 AM, Philippe Verdy
wrote:
The related definition for extended grapheme
clusters says:
(
CRLF
| Prepend* (
RI-sequence | Hangul-Syllable | !Control )
( Grapheme_Extend | SpacingMark )*
|
. )
However I do not understand why it may include only one
Hangul-Syllable when applying prepended concatenation marks.
And if the definition excludes whitespaces, nothing prevents
it to extend to arbitrary sequences of
letters/digits/symbols/punctuations (this could span very long
sequences of sinograms, or other letters from scripts that do
not use whitespaces as word separators. Even in the Latin
script it would extend to the punctuation signs that may
follow any word, or to an entire mathematical formula such as
"1+2*3" but not "sin x"...
White space is clearly NOT part a grapheme cluster, so I don't see
what your issue is?
BTW, if after careful analysis you think there is a mistake, you
should probably raise a bug on this.
Because, as you note, they lack proper spans (no closing delimiter)
the type (and number) of characters that they can apply to must be
carefully limited, or else you get unexpected results.
The marks that do have a description in the core spec, do have clear
limitations on what they can apply to (digit, digit run, or word
run, seem the be the candidates).
As a generic algorithm, I don't mind terribly much if the grapheme
cluster overproduces clusters (for example prepend* should really be
multiple clusters not a single one if the count > 1).
But the actual display algorithms must get this correct, and there
must be an agreed upon specification of what an author can expect
that a reader will see if the prepend is correctly supported.
While many elements are present, the specification (or its
presentation) isn't as clear and unambiguous as one would like
A./
Received on Thu Nov 26 2015 - 05:39:09 CST