Accumulated Feedback on PRI #417

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Thu Jan 21 18:02:00 CST 2021
Name: Steven Luscher
Report Type: Error Report
Opt Subject: Korean word boundary example in UAX#29

Hi folks,

In the UAX #29 document (https://www.unicode.org/reports/tr29/#Word_Boundaries) it is written:

> According to Korean standards, the grammatical suffixes, such as “에” meaning “in”, 
are considered separate words. Thus the above sentence would be broken into the following five words:
> 
> 나,  는,  Chicago,  에, and  산다.

A Korean speaking colleague of mine tells me that he, in fact, considers ‘나는’ to be one word. 
In Mac OS, placing the cursor to the left of ‘나는’ and pressing Command-RightArrow moves you 
rightward past both graphemes.

Could the spec be wrong where it claims that 나 and 는 are two words?

Thank you,
Steven…

Date/Time: Fri Jan 29 18:14:29 CST 2021
Contact: johnsoneal@gmail.com
Name: Neal Johnson
Report Type: Error Report
Opt Subject: Unicode Standard Annex #29 - 3 Grapheme Cluster Boundaries - SpacingMark

Unicode Standard Annex #29 - 3 Grapheme Cluster Boundaries - SpacingMark 
(https://www.unicode.org/reports/tr29/#SpacingMark ) states that U+11720 and 
U+11721 should be specifically excluded. However "GraphemeBreakProperty.txt" 
list both as included and as such {{UCharacter.getIntPropertyValue(0x11721, 
UProperty.GRAPHEME_CLUSTER_BREAK) }} return 10 "SPACING_MARK".

I am not sure if this an issue in the "GraphemeBreakProperty.txt" data file 
or an issue in Annex #29.

(submitted by Markus on behalf of Neal who mis-reported this as 
https://unicode-org.atlassian.net/browse/ICU-21438 )


Date/Time: Mon Mar 22 18:43:49 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: A small editorial issue on UAX #29

On the Comments column on the row second from the bottom (for "kʷ") in 
Table 1a, The annex says "sequence with letter modifier", though I believe 
the Unicode Standard uses a term "modifier letter" but "letter modifier" 
to describe a character like "ʷ".  It should be changed to read "sequence 
with modifier letter" for less confusion.

Date/Time: Sun Mar 28 06:31:11 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: UAX #29 contains a strange statement as an explanation

The second line in Section 4 (Word Boundaries) currently reads:

The most familiar ones are selection (double-click mouse selection or “move
to next word” control-arrow keys) and the dialog option “Whole Word Search”
for search and replace.

It implies that '"move to next word" control-arrow keys' is a "selection",
but I believe it is contrary to the common function; control-arrow key
usually instructs a movement of the cursor without selection, and if you
want to select to next word, you need to press control-shift-arrow keys.

Probably we should either change '"move to next word" control-arrow keys' to
'"select to next word" control-shift-arrow keys" or change the nearby
phrases to something like '... selection (double-click mouse selection),
cursor movement ("move to next word" control-arrow keys), and the dialog
...'

I hope this feedback helps.

Date/Time: Sun Apr 11 18:04:14 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: Inappropriate description in UAX #29

The 3rd paragraph of "7 Testing" in UAX #29 "Unicode Text Segmentation"
explains the format of the three auxiliary files (referred to as
[Charts29]), and I believe the current description is different from the
actual auxiliary files.  It says "The header cells of the chart consist of a
property value, followed by a representative code point number.", but no
"representative code point number" follows the property name on the actual
chart.  It also says " hovering the mouse over the code point number will
show the character name, General_Category, Line_Break, and Script property
values.", but the character name etc. are shown when hovering over property
values but code point numbers (perhaps because there are no code point
numbers).  Either the description of the charts in UAX #29 or the charts
themselves should be corrected to make them consistent.