Re: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Fri, 25 Apr 2014 00:12:22 +0100

On Thu, 24 Apr 2014 23:07:58 +0200
Mathias Bynens <mathias_at_qiwi.be> wrote:

> I realize reversing a string has nothing to do with text segmentation
> – but ignoring grapheme extenders leads to unexpected results (since
> after reversing the code points, the grapheme extender might extend
> the wrong character):
> https://github.com/mathiasbynens/esrever/issues/5

Actually, it has a lot to do with text segmentation - you need to work
out what are really thought of as the characters. שָׁלוֹם is a nice
illustration of the problems. Is reversing twice to yield the string
you first started with? Is reversing three times to give the same
result as reversing once? What does reversing a Hangul syllable do?
Canonically equivalence should be preserved! Should renderability be
preserved? What does Thai เกราะ /krɔ̀ʔ/ <U+0E40, U+0E01, U+0E23,
U+0E32, U+0E30> reverse to? /ʔɔ̀rk/ is unpronounceable in Thai, and if
it were it would be written อ็อรก <U+0E2D, U+0E47, U+0E2D, U+0E23,
U+0E01>. Thai เพลา <U+0E40, U+0E1E, U+0E25, U+0E32> is the spelling
of two unrelated words, pronounced /pʰlaw/ and /pheː laː/ respectively.

Richard.

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Apr 24 2014 - 18:13:47 CDT

This archive was generated by hypermail 2.2.0 : Thu Apr 24 2014 - 18:13:48 CDT