Accumulated Feedback on PRI #375

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Thu Apr 26 18:50:04 CDT 2018
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: PRI 375 for UTS #46 v11


== UTS #46 ==

>> During the beta period, the new-format test data is provided in a test file called IdnaTest2.txt,
Actually, it is called IdnaTestV2.txt, in accordance with

    [152-C1] Consensus: Adopt L2/17-176 with one amendment:
    change the name of the file to IdnaTestV2.txt for Unicode version 11.0.

>> Feedback is desired as to whether use the name IdnaTest2.txt for
>> the new-format data or to keep the name IdnaTest.txt for it.

I prefer keeping the new name IdnaTestV2.txt to make it obvious that test code needs to change.

>> The removal of the first field causes some of the lines to bidi-reorder. ...
>> To solve that, a LRM could be introduced at the start and end of each field
>> that contained a RTL character. That would require the test implementations
>> to filter out those characters. Because the format is changing anyway,
>> this would be the time to make such a change.

I agree that this would be useful.

I suggest that we could further improve on this: Replace both the semicolon
field separator and the proposed invisible LRMs with one single character
from [:bc=L:]&[:P:] at the *start* of each field. It would be visible and
reset each field to LTR, while at the same time serving as the separator.
Parsers would simply split the input line with that character and then trim
each field as usual.

For example, using । U+0964 DEVANAGARI DANDA:

faß.de; ; ; xn--fa-hia.de; ; fass.de;  # faß.de
̈.א; ; [B1, B3, B6, V5]; xn--ssa.xn--4db; ; ;  # ̈.א
(This is right-aligned and jumbled in a text editor.)
→
। faß.de। । । xn--fa-hia.de। । fass.de।  # faß.de
।̈.א। । [B1, B3, B6, V5]। xn--ssa.xn--4db। । । # ̈.א

>> This section repeats material that is in the test file header.
>> Having the material in two places makes it more likely that it get out of sync.
>> Should we remove the information in one of the places,
>> and replace by a pointer to the other?
>> If so, where would be the best place to have it, in the test file header or here?

Please put the file format documentation in only one place.
I suggest we put it into the UTS text and replace the .txt version with a link to there.

== IdnaTestV2.txt ==

I have adjusted the ICU test to use IdnaTestV2.txt rather than the old version.

As before, I have to ignore the expected output string when there are errors
(status not empty) because the test file generator passes through disallowed
characters while ICU replaces them with U+FFFD. This is a mismatch between
the UTS #46 processing step ("disallowed: Leave the code point unchanged in
the string, and record that there was an error") and the following
recommendation ("it is recommended that disallowed characters be replaced by
a U+FFFD to make them visible to the user"). (ICU does this to simplify and
speed up the code.)

→ We should point this out in section 8.

The old file indicated "no errors" via the omission of the status. The new
file usually does, too, but sometimes has explicit "[]".

→ Please remove "[]" and consistently indicate "no errors" via the omission of the status.