Accumulated Feedback on PRI #359

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Mon Sep 25 15:12:22 CDT 2017
Name: Thomas Milo
Report Type: Public Review Issue
Opt Subject: Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm Now Available for Public Review

Please consider taking into account the established solutions for these
sequences as already implemented in www.mushafmuscat.om, which is now
available world-wide as the authoritative, Azhar-recommended electronic
reference Qur’ān.

I don’t expect fundamental disagreements, but the project handles and solves
all spelling issues without extending the existing Unicode repertoire for
Arabic.

However, for one class of characters we improved the behaviour by changing
their typographical behaviour from overstrike to a new category of
contextual behaviour: amphibious. I’ve reported about Amphibious Characters
to the UTC.

Some practical tips:

Clicking the splash screen opens the text.

Words can be searched in Manuscript View Mode, which presents the verses
separated by flowers, surrounded by navigation and graphic controls. On the
left top are the page number, text search and version locator boxes.
Chapters can be located with the Wheel on the left top.

Historical text layers can be exposed with the Colour Triangle at the left bottom.
The dormant miniatures of unpointed characters can be activated with the حسصطعه icon at the left bottom.
The chapter headings are in unpointed palaeographic Arabic; a ٮٮٯٮط / تنقيط icon (on all pagespreads 
except the first) provides two optional styles of pointing. 

Clicking in the margin brings up the Printed Mushaf View Mode, with verses marked by numbers.

The Unicode structure can be found by clicking in the text, which brings up the Interactive View Mode.

Letter blocks light up with mouse-over (selecting a letter block also
activates the WordShaping interface, which provides aesthetic user
interaction without touching the Unicode structure).
Single-click selects letter block
Double-click selects word
Triple-click selects verse

CTRL+C (windows) or CMD+C (Mac) copies the selected Unicode string.

Caveat: we are preparing an update that positions all Qur’ānic stops to word final position, where they 
belong. This change will affect a few words that end in U+06E6 Arabic Small Yeh and U+06E5 Arabic Small Waw.


Some background information:
https://www.egypttoday.com/Article/4/14269/The-world%E2%80%99s-first-e-Quran-is-here
https://oumma.com/premiere-mondiale-coran-numerique-presente-a-mascate-oman/
The presenting the project to the crown prince of Oman, HRH Sayyid Haytham 
https://www.youtube.com/watch?v=sHtBL2GvBxE
My speech without voice-over
https://www.youtube.com/watch?v=UpxsWGxgJIo

Please don’t hesitate to ask for more clarification if needed.

Date/Time: Mon Sep 25 19:54:28 CDT 2017
Name: A./
Report Type: Public Review Issue
Opt Subject: PRI 359

1.) Better guidance should be given when to apply this algorithm. From
reading the draft, it is usefully applied as a standard preparatory step
before handing text off to a rendering engine, or perhaps also as a standard
transformation on input to a rendering engine. This should be explicitly
stated.

2.) If there are other situations, operations or processes where
transforming Arabic text using this algorithm are seen as useful, these
should be stated explicitly.

3.) There are situations and protocols that demand text in a given
normalization form. Care should be taken in presenting the new algorithm so
that it does not lead users to expect that all Arabic text "out to be"
always in the transformed format.

4.) The stability note before 3.2 could be improved. The word "existing"
will change meaning. Therefore:

The set of MCM characters is intended to be stable. Characters from Unicode
Version XXXX or earlier will not be added or removed from this set in future
updates of this algorithm. Future updates may add characters to the set only
if they were encoded in any version after XXXX.

[The future version of the algorithm then changes XXXX to the latest value.
This wording allows the TR to skip any versions of the Unicode Standard that
do not contain new combining marks in Arabic.] 

​5.) In step 2, the specification does not address keeping multiple
instances, e.g. multiple MCM, in relative order when moved "to the
beginning". The current text could be interpreted as requiring multiple
instances of such character to be inverted in relative order as each is
moved "to the beginning". (The issue theoretically exists for shadda as it
is defined by CCC value, which on the face of it allows the possibility of
multiple distinct shadda code points where again, internal ordering could be
observable).

Date/Time: Fri Oct 6 05:59:06 CDT 2017
Name: r12a
Report Type: Public Review Issue
Opt Subject: When should UAOA be used?

I'm sending this on behalf of the W3C i18n WG. It relates to UTR#53.

I'm hearing through other channels that the algorithm described is intended
to just indicate how characters should be temporarily reordered prior to
rendering, rather than describe the order in which code points should be
stored. Since most fonts generally produce the behaviour described anyway,
it presumably therefore amounts to documenting expectations in terms of font
behaviour, rather than specifying a new form of normalisation.

It's not at all clear from the document that that is the case, however,
which has caused the W3C WG significant alarm (and wasted discussion
cycles). Please update the document to make this clearer. We will hold back
the other comments we currently have queued up to send until we can re-
evaluate them in the light of the changes to the document.

Btw, the understanding of the intended use of UAOA is not helped by the way
the document mentions canonically equivalent character sequences, nor by the
vague descriptions of when CGJ should be used.

Date/Time: Fri Oct 6 06:05:21 CDT 2017
Name: r12a
Report Type: Public Review Issue
Opt Subject: AMOA rather than UAOA ?

http://www.unicode.org/reports/tr53/

"The Unicode Arabic Mark Ordering Algorithm (UAOA)"

I find it difficult to figure out how one should pronounce UAOA and
difficult to pronounce either way. I think AMOA (or even UAMAO) would be
easier. Please consider that or some other change.

Date/Time: Tue Oct 10 09:40:48 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #359: U+08D9 ARABIC SMALL LOW NOON WITH KASRA

U+08D9 ARABIC SMALL LOW NOON WITH KASRA has Canonical_Combining_Class=Above 
when it should have been Below. Could the UAOA reorder it as Below?

Date/Time: Fri Oct 13 16:48:21 CDT 2017
Name: Behnam Esfahbod
Report Type: Public Review Issue
Opt Subject: Feedback on Proposed Draft UTR #53 — Revision 1

Status: Liaison Contribution - W3C i18n WG

# Using UAOA in Text Editing
On Section 5.6 “Other uses for UAOA”, we have:

> > UAOA is very useful in implementations of backspacing in cases where 
> > there is no external information available about the original order 
> > in which the text was entered.

For an average user of modern languages using the script, reordering the
marks entered on a keyboard would be unexpected behavior.

Basically, the document is suggesting that when user authors a text file
with Arabic Marks put in a specific order, when the files is closed and
reopened, the backspace should behave differently from the previous session.

Also, it is not clear at all if UAOA will be useful in a text editing
scenario. The claim for UAOA to be "very useful" needs some evidence, like
existing implementation or some other data to support it.

From the language and examples of the document, it looks like the usage of
the algorithm is too focused on one application, Quranic text, and the
claims are related only to that specific application of the script.

Date/Time: Fri Oct 13 16:59:35 CDT 2017
Name: Behnam Esfahbod
Report Type: Public Review Issue
Opt Subject: Feedback on Proposed Draft UTR #53 — Revision 1

Status: Individual Contribution

The way Unicode Normalization works for Arabic Marks indeed has its
problems, specially in font development and text rendering. The algorithm
proposed in this PDUTR is a good way to address some of these problem. But,
the document needs improvements in a few areas to be clear about what it
does, when it should be applied, how it should be used, and what to expect
from it.

# 1. Scope of the PDUTR

It looks like the PDUTR is the first UTR focused on details of rendering of
Unicode text (besides the text of the Unicode Standard). Arabic is only one
of the scripts that need some special attention (possibly reordering of the
characters in memory) for rendering. It could be a better approach to have a
document (UTR) focused on text rendering, which would also contain this
algorithm for Arabic script, and would collect other best-practices over
time, for other issues of rendering Arabic script, as well as other scripts.

# 2. Scope of the algorithm

The scope of the algorithm is not clear, neither in its title nor in the
language.

The name “Unicode Arabic Mark Ordering Algorithm” is suggesting that this is
expected to be the only way Arabic Marks should be ordered in Unicode.
That’s clearly not the case. In fact, the document is proposing an algorithm
for “reordering” Arabic Marks (not just how they should be ordered) to solve
a problem in “rendering” of the script. The title need to be clear about
this. Maybe “Unicode Arabic Mark Reordering Algorithm for Rendering”
(AMRAR)?

Similarly, the Section 2 “Background” doesn’t clarify the scope of the
algorithm and only explains how something is not working for some specific
application with the existing normalization methods.

# 3. Consequences of the Algorithm: Normalization

The draft proposal is not clear about the effects of applying the algorithm
on text. Specially, for strings X for which this algorithm is useful, we
have UAOA(toNFC(X)) ≠ toNFC(UAOA(X)).

So, although the behavior of the algorithm can be stabilized over Unicode
verions, it’s very important how and when it’s applied to the text, since it
changes a text in normalized form to a non-normalized form. Therefore, in
terms of normalization, the algorithm cannot be considered stable at all.
The document needs to be clear about this, even though it’s obvious from a
technical point of view.

# 4. Consequences of the Algorithm: Semantics

With UAOA applied on text during rendering, some strings collapse into a
single sequence. Basically, there are plenty of strings X and Y, where
toNFC(X) ≠ toNFC(Y), but UAOA(toNFC(X)) = UAOA(toNFC(Y)).

Basically, this is changing the semantics of existing text encoded in
Unicode, since the rendering will be different afterwards. The document is
not clear about this semantic change and only claims to “correcting” all the
problems.

The proposal is suggesting to use CGJ to preserve the old semantics when
needed. The document needs to be more clear about how to preserve the
semantics. In fact, there should be a clear algorithm to convert a string X
to preserve the semantics when changing the (rendering) interpretation,
since for a couple of decades users have been storing text in the current
semantics of the encoding, which has been the only recommended way to do so
by Unicode.

# 5. Not enough details in the examples

The examples are missing the information needed for the average audience to
understand the details. To be understood correctly, they need to be
accompanied by the encoding of the text they are representing, and how the
algorithm works on such a sequence.

Feedback above this line was reviewed in the October 2017 UTC meeting.

Date/Time: Wed Jan 10 08:29:55 CST 2018
Name: r12a
Report Type: Error Report
Opt Subject: Use HTML rather than PDF

This is a comment from the W3C i18n WG.

http://www.unicode.org/reports/tr53/

When the spec is provided for review in PDF it isn't possible to

 -   link to a specific section in the review report
 -   copy the text into a report
 -   search for text in the document when reviewing reported issues.

Could we, in future, please provide HTML-based documents? (It's ok to use images 
for the examples that are unlikely to be rendered properly for all readers.)

Feedback above this line was reviewed in the January 2018 UTC meeting.