Aquaφοβία from Richard Wordingham via Unicode on 2017-12-09 (Unicode Mail List Archive)

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Sat, 9 Dec 2017 14:28:31 +0000

Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
implies that it might be considered desirable to have a word boundary
in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C,
U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which
should be <006C, U+0310 COMBINING CANDRABINDU> in accordance with the
principle of script separation. Why are such breaks desirable?

I can understand an argument that these should be tolerated, as an
application could have been designed on the basis that script
boundaries imply word boundaries (not true for Japanese) and that word
boundaries imply grapheme cluster boundaries (not true for Sanskrit,
where they don't even imply character boundaries.) There are some who
claim that the Laotian consonant place holder is the letter 'x' rather
than the multiplication sign, U+00D7, which does have
Indic_syllabic_category=Consonant_Placeholder. (I trust no-one is
suggesting that there should be grapheme cluster boundary between
U+00D7 with script=common and a non-spacing Lao vowel any more than
there would be with a Lao consonant.)

Richard.
Received on Sat Dec 09 2017 - 08:28:59 CST

This archive was generated by hypermail 2.2.0 : Sat Dec 09 2017 - 08:29:00 CST