Accumulated Feedback on PRI #382

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Wed Oct 10 09:20:53 CDT 2018
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Feedback on PRI #382

The TA source has been accepted for the TCA's horizontal extension by WG2, so the 
syntax of kIRG_TSource in UAX #38 should be changed to T[1-7A-F]-[0-9A-F]{4} correspondingly. 

Date/Time: Thu Oct 11 12:01:04 CDT 2018
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: PRI #382 feedback

The Syntax for the kIRG_KSource property should be changed to the following to 
accommodate the new "K6" source prefix:

K([0-6]-[0-9A-F]{4}|C-[0-9]{5})

Date/Time: Tue Nov 6 11:40:36 CST 2018
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: PRI #382 feedback

1) kIRG_GSource changes:

Syntax field:

Modify the regular expression as follows to accomodate eight hexadecimal
digits for the existing "G9-" source prefix, and to support the new "GHF-,"
"GHZR-," and "GLK-" source prefixes:

G4K
| G[013578EKS]-[0-9A-F]{4}
| G9-[0-9A-F]{4,8}
| G(DZ|GH|RM|WZ|XC|XH|ZH)-\d{4}\.\d{2}
| G(BK|CH|CY|HC)(-\d{4}\.\d{2})?
| GKX-\d{4}\.\d{2,3}
| GHZR?-\d{5}\.\d{2}
| G(CE|FC|IDC|OCD|XHZ)-\d{3}
| G(H|HF|LGYJ|PGLG)-\d{4}
| G(CYY|JZ|ZFY|ZJW|ZYS)-\d{5}
| GFZ(-\d{5})?
| GGFZ-\d{6}
| G(LK|Z)-\d{7}

Description field:

Change "The IRG “G” source mapping for this character in hex." to "The IRG “G” source mapping for this character in hexadecimal or decimal."

Add the following three new source prefixes per Section 2 of WG2 N4988:

GHF 鄭賢章:《漢文佛典疑難俗字彙釋與研究》, 成都: 巴蜀書社, 2016, ISBN 978-7-5531-0700-4
GHZR 汉语大字典编辑委员会:《汉语大字典(第二版)》, 武汉: 湖北长江出版集团崇文书局 & 成都 : 四川出版集团四川辞书出版社 , 2010, ISBN 978-7-5403-1744-7
GLK 《龍龕手鑑》(續古逸叢書)


2) kIRG_JSource changes:

Description field:

Change "The IRG “J” source mapping for this character in hex." to "The IRG
“J” source mapping for this character in hexadecimal or decimal."


3) kIRG_MSource changes:

Syntax field:

Change the regular expression to the following:

MAC-\d{5}

Description field:

Change "The IRG “M” source mapping for this character." to "The IRG “M”
source mapping for this character in decimal."


4) kIRG_USource changes:

Syntax field:

Change the regular expression to the following:

U(TC|CI|K|SAT)-\d{5}

(Note that other instances of [0-9] in other properties can be changed to \d.)


5) About the use of "in hex" (or "in hexadecimal or decimal") at the end of
the first sentence of the "Description" field, another solution is to
completely drop it in the dozen or so properties that specify it, because
whether a property value uses hexadecimal or decimal digits, or a mixture of
both, is implicit in the "Syntax" field. If this is kept, I prefer that it
be spelled out as "hexadecimal."

That is all.

Date/Time: Wed Nov 7 23:14:32 CST 2018
Name: CNMan
Report Type: Error Report
Opt Subject: Feedback on PRI #382

$ grep kHanyu Unihan_Variants.txt
U+5909  kSemanticVariant        U+8B8A<kHanyu:T
U+8B8A  kSemanticVariant        U+53D8<kMatthews,kMeyerWempe U+5909<kHanyu:T

"y" of two lines in Unihan_Variants.txt need to convert lower case to upper case.

Thanks

Date/Time: Thu Nov 8 04:46:18 CST 2018
Name: Eiso Chan
Report Type: Public Review Issue
Opt Subject: UAX #38 Feedback

For kIRG_TSource, all the TCA submitted characters are included in CNS 11643. The 
latest paper version is CNS 11643-2007, but more characters have been included in 
the online version. 

TA will be used in Unicode, 12.0.0, and T13 has been accepted to use in WS2015 and 
WS2017, so the syntax should be changed to T[1-7A-F]{1,2}-[0-9A-F]{4}. 
In Description, TA means the 10th plane in CNS 11643 and TB means the 11th plane. 
"1992" should be removed. I suggest change the detail for T1 to T13 as below:

T1 TCA-CNS 11643 1st plane
T2 TCA-CNS 11643 2nd plane
T3 TCA-CNS 11643 3rd plane with some additional characters
T4 TCA-CNS 11643 4th plane
T5 TCA-CNS 11643 5th plane
T6 TCA-CNS 11643 6th plane
T7 TCA-CNS 11643 7th plane
TA TCA-CNS 11643 10th plane
TB TCA-CNS 11643 11th plane
TC TCA-CNS 11643 12th plane
TD TCA-CNS 11643 13th plane
TE TCA-CNS 11643 14th plane
TF TCA-CNS 11643 15th plane
T13 TCA-CNS 11643 19th plane

TA and TB should modified because there are other characters in 10th plane 
and 11th plane. All TA characters are from CJK Ext B. "with some additional 
characters" under T3 can't be removed, because some characters from 
Pseudo-CNS-E in Unicode 1.0 are not included in CNS 11643-2007 but included 
in the online version. For example, the T ref. for 啲 is E-6722 in Unicode 
1.0 and the current T ref. is T3-6722, but 3-6722 is empty in CNS 11643-2007 
and the code for 啲 in CNS 11643-2007 is F-334B. 

For "CNS 11643, X 5012 (p.3) lists the following reference works:", this part 
has been removed in CNS 11643-2007 and the page information in CNS 11643-1992 
is Page 319 not Page 3. This part could be kept in UAX #48 but could be changed 
to "CNS 11643-1992 (p.319) lists the following reference works:". 

Date/Time: Wed Dec 5 13:28:34 CST 2018
Name: Richard Cook
Report Type: Public Review Issue
Opt Subject: PRI#382

> > see UAX #45
(1) add a link there to the relevant section of UAX#45

https://www.unicode.org/reports/tr45/#Section1

(2) spell out each of the four kIRG_USource source prefixes 
in full in the Description, so that they are searchable, 
expanding the regex ...

	U(TC|CI|K|SAT)

... to add these strings ...

	UTC
	UCI
	UK
	USAT

... in the body of the Description, to  help someone 
searching for e.g. USAT to find it in UAX#38

Date/Time: Wed Dec 5 13:46:42 CST 2018
Name: Richard Cook
Report Type: Public Review Issue
Opt Subject: PRI#382

in UAX#38, section 4.4, for each table ...

https://www.unicode.org/reports/tr38/tr38-26.html#BlockListing

... add a new column with the character count for the column#1 
range per row, and a new row with overall total per table.

Date/Time: Wed Dec 5 13:53:33 CST 2018
Name: Ken Whistler
Report Type: Public Review Issue
Opt Subject: PRI #382, UAX #38 suggestion

A simple improvement to the table of ranges of CJK characters 
in various blocks would be to add a column with the actual 
character count for each row of ranges. See:

https://www.unicode.org/reports/tr38/tr38-26.html#BlockListing