This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Fri Mar 27 08:58:14 PT 2026
ReportID: ID20260327085814
Name: Mark-Jan Nederhof
Report Type: Public Review Issue
Opt Subject: PRI #538 legacy characters in formatting
For PRI #538 Section 3.5: "they should not participate in complex layout scenarios where their alternate sequences are likely preferred" This is both meaningless and reckless. What does "likely" mean? According to whose calculation of likelihood? And according to whose preferences? It is reckless because this says that correct implementations of the controls are compelled to actively thwart the formatting when "legacy" characters are involved, presumably as perverse punishment inflicted on users for thinking (in many cases correctly) that they have legitimate reasons to still be using the "legacy" characters. What is proposed here flies in the face of the most basic advice we give computer science students from first year onward to make their implementations robust. In addition, the choice of which code points are tagged as "legacy" is currently an abject mess. Significantly distinct glyphs were made legacy while the Extended list introduced countless graphical variants that are virtually indistinguishable from other graphical variants. Several composite signs were made "legacy" while several new composite signs were introduced with the Extended list. For more see: https://nederhof.github.io/newgardiner/unicode16comments.html for example under U+14067, U+13FF2, U+1432F, and see: https://nederhof.github.io/newgardiner/legacy.html for example regarding U+13403 and U+131A5.
Date/Time: Fri Mar 27 09:10:28 PT 2026
ReportID: ID20260327091028
Name: Mark-Jan Nederhof
Report Type: Public Review Issue
Opt Subject: PRI #538 canonical ordering of overlays
For PRI #538 Section 4.1 (under kEH_AltSeq) The decomposition of U+1325F is blatantly incorrect. Where there is interaction between overlay and insertion, the insertion must come last. The reason is self-evident: insertion places a smaller sign in "free space" in or around a larger sign or combination of signs, and that free space generally does not exist or has not been delimited until after the overlay has happened. The syntax governing the relation between overlays and insertions was specified several times in documents in the L2 repository, including in the approved 2021 document introducing new control characters: https://www.unicode.org/L2/L2018/18236-nederhof.pdf https://www.unicode.org/L2/L2021/21248-egyptian-controls.pdf https://www.unicode.org/L2/L2024/24079-egyptian-fmt-controls.pdf By now hundreds of texts have been encoded using this syntax, which has thereby proven itself, and this syntax is firmly embedded in several graphical editors and formatting tools. One cannot now change the syntax almost 10 years after insertions and overlays were introduced into Unicode, on the basis of a misunderstanding of the syntax and semantics of the controls. For this particular group see also: https://nederhof.github.io/hierojax/ligaturelist.html (under U+13257 + U+1327B) and: https://nederhof.github.io/hierojax/insertionlist.html (near the bottom of the page). The use of kEH_Cat for enforcing a canonical ordering on "simple" overlays (i.e. overlays consisting of only two signs) is totally unacceptable. First, the kEH_Cat values are a deeply flawed way of trying to categorize (graphemes or graphical variants of) hieroglyphs and has no scientific merit. If the kEH_Cat values may have served a useful practical purpose during creation of the Extended list, then that purpose has ended. The Egyptological community has repeatedly indicated that they do not wish to have this naming scheme forced upon them. Letting a canonical ordering for overlays be determined by kEH_Cat values appears to be yet another attempt to force something on encoders and implementers that we most definitely do not want, let alone need. In itself, an arbitrary canonical ordering for overlays could conceivably help with search functionality. If something arbitrary like Unicode values are not suitable as ordering, then the best option is to place the "taller/narrower" sign first and the "wider/flatter" sign second. This would seamlessly combine with the syntax of "complex" overlays (i.e. consisting of three of more signs): "core group ::= flat_hor_group + flat_ver_group literal" (quoted from Table 3 of https://www.unicode.org/L2/L2021/21248-egyptian-controls.pdf) In a "flat_hor_group", signs are horizontally arranged and would therefore typically be tall and narrow, and in a "flat_ver_group", signs are vertically arranged and would therefore typically be wide and flat. One obvious advantage of this is that encoders would intuitively know which ordering to use without needing to remember the underlying encoding or some arbitrary label. This is the only canonical ordering with this property.