Another UAX #29 bug: property tables need updating

From: Manish Goregaokar <manish_at_mozilla.com>
Date: Thu, 22 Dec 2016 10:35:55 -0800

The spec lists GraphemeBreakProperty.txt[1] and
WordBreakProperty.txt[2] as the normative source for grapheme and word
categorization respectively.

However, the spec also gives non-normative definitions of these
properties. In particular, it defines Glue_After_Zwj[3] as

> Emoji characters that do not break from a previous ZWJ in a defined emoji zwj sequence, and are not listed as Emoji_Modifier_Base=Yes in emoji-data.txt. See [UTR51].

Going through emoji-zwj-sequences.txt[4], there are a lot of emoji
characters that satisfy this property. The kiss/heart emojis are like
this, as well as every object emoji in the "Gendered Role, with
object" section. However, we only count the kiss, heart, and speech
bubble emoji as GAZ in the property table.

The property table should include all role and gender modifiers as GAZ.

Could this be updated?

 [1]: http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakProperty.txt
 [2]: http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/WordBreakProperty.txt
 [3]:http://www.unicode.org/reports/tr29/proposed.html#Glue_After_Zwj
 [4]: http://unicode.org/Public/emoji/4.0/emoji-zwj-sequences.txt

Thanks,
-Manish
Received on Thu Dec 22 2016 - 12:36:48 CST

This archive was generated by hypermail 2.2.0 : Thu Dec 22 2016 - 12:36:48 CST