Accumulated Feedback on PRI #390

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Mon Dec 3 18:30:48 CST 2018
Name: Andy Heninger
Report Type: Error Report
Opt Subject: Word Break of Full Width Digits

Forwarding a report from Jungshik Shin regarding word break of Full Width 
Digits using ICU (which in turn implements UAX 29)

> > When a string contains full width digits such as U+FF11 for the digit '1', 
> > using a RuleBasedBreakIterator for word breaks the character is not recognized 
> > as a digit. Using getRuleStatus on the position will return UBRK_WORD_NONE.
UAX-29 places word boundaries between adjacent full width digits.

I propose that the word break property of the full width digit characters be
changed to Numeric. These characters do not currently have specifically
assigned word break property, meaning they fall into the "any" bucket for
the rules.

Chrome, Firefox and Microsoft Edge already group multi-digit full width
numbers as words.

The line break property would not be changed. It would remain ID
(ideographic), allowing line breaks between the digits of a full width
number.

Date/Time: Wed Dec 26 19:55:55 CST 2018
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Editorial change for the update of UAX #29

The way the Georgian letters (in table 4) are specified is very ugly. It can
be alleviated if the second row just says:

U+10FD or U+10FF | (ჽ or ჿ) | GEORGIAN LETTER AEN or GEORGIAN LETTER LABIAL SIGN

Similarly for the lower part:

U+1CBD or U+1CBF | (Ჽ or Ჿ) | GEORGIAN MTAVRULI CAPITAL LETTER AEN or GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN

Rules WB7b and WB7c, are difficult to read because it looks like
"Double_Quote Hebrew_Letter" and Hebrew_Letter Double_Quote respectively are
in fact a single entity rather than two. I propose to use a set of
parentheses for cases like this. This has no impact on the algorithm at all.

They would end up like this:

WB7b. Hebrew_Letter × (Double_Quote)(Hebrew_Letter)
WB7c. (Hebrew_Letter)(Double_Quote) × Hebrew_Letter