Re: New Character Property for Prepended Concatenation Marks from Asmus Freytag (t) on 2015-11-26 (Unicode Mail List Archive)

From: Asmus Freytag (t) <asmus-inc_at_ix.netcom.com>
Date: Thu, 26 Nov 2015 03:38:13 -0800

On 11/26/2015 3:08 AM, Philippe Verdy wrote:

The related definition for extended grapheme clusters says:

( CRLF
| Prepend* ( RI-sequence | Hangul-Syllable | !Control )
( Grapheme_Extend | SpacingMark )*
| . )

However I do not understand why it may include only one Hangul-Syllable when applying prepended concatenation marks. And if the definition excludes whitespaces, nothing prevents it to extend to arbitrary sequences of letters/digits/symbols/punctuations (this could span very long sequences of sinograms, or other letters from scripts that do not use whitespaces as word separators. Even in the Latin script it would extend to the punctuation signs that may follow any word, or to an entire mathematical formula such as "1+2*3" but not "sin x"...

White space is clearly NOT part a grapheme cluster, so I don't see what your issue is?

BTW, if after careful analysis you think there is a mistake, you should probably raise a bug on this.

Because, as you note, they lack proper spans (no closing delimiter) the type (and number) of characters that they can apply to must be carefully limited, or else you get unexpected results.

The marks that do have a description in the core spec, do have clear limitations on what they can apply to (digit, digit run, or word run, seem the be the candidates).

As a generic algorithm, I don't mind terribly much if the grapheme cluster overproduces clusters (for example prepend* should really be multiple clusters not a single one if the count > 1).

But the actual display algorithms must get this correct, and there must be an agreed upon specification of what an author can expect that a reader will see if the prepend is correctly supported.

While many elements are present, the specification (or its presentation) isn't as clear and unambiguous as one would like

A./
Received on Thu Nov 26 2015 - 05:39:09 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 26 2015 - 05:39:09 CST