L2/24-225

Editorial Working Group Report and Recommendations for UTC #181 Meeting

Source: Editorial Working Group

Date: October 24, 2024

A. Unicode Release Topics

A1. Unicode 16.0.0 Report

FYI: The Unicode 16.0 core specification was published as scheduled on September 10, 2024, along with the Version 16.0 of the Unicode Standard, including all its associated annexes and correlated Unicode Technical Standards.

A2. Unicode 17.0.0 Report

FYI: The Editorial Committee has started review of new content planned for the eventual 17.0 publication of the core specification. There is also ongoing work to do routine upkeep of the core specification and to stay current with bug reports and other small tweaks to core specification content mandated by the UTC.

B. Website Topics

B1. General Matters

The Editorial Working Group continues its periodical review and general maintenance of Unicode web pages, both out of its own initiative and public feedback.

We are currently looking into a potential redesign of the TUS landing page, especially for the purpose of improving access to the core specification from the page (per feedback ID20241006161903), with a possible backport of these improvements to the 16.0 TUS landing as well. The Editorial Working Group plans to coordinate with the Release Management Group on this matter.

B2. FAQ

We have updated the private unicode-org FAQ repository so that the source is maintained directly from the repository and deployed from it.

The FAQ pages are automatically glossarized during deployment.

C. Editorial Working Group Process Issues

FYI: The Editorial Working Group continues to meet regularly. Our meetings are generally held on a biweekly schedule, except when holidays or other events coincidence, such as UTC meetings. This report to the UTC includes feedback from the Editorial Working Group meetings held on August 1, 2024, August 15, 2024, August 29, 2024, September 12, 2024, September 26, 2024, October 10, 2024, and October 24, 2024.

FYI: Public-facing information about the Editorial Working Group and its work is maintained on the Unicode Editorial Working Group Page on the website. The Editorial Working Group also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Working Group or contribute to that work should contact the Chair, Louka Ménard Blondin ([email protected]).

In the past, we have maintained a separate series of TUS Futures meetings in order to discuss improvements to the presentation of our web pages and some of our more public-facing documents. With the core of this work done and fully rolled into the distribution of the Unicode 16.0 core specification, we have discontinued TUS Futures as separate meetings.

The Editorial Working Group is in ongoing need of volunteer editors with copyediting experience. People who are interested in learning more about this work and potentially take it up should contact the Chair for more information.

Work is ongoing on improving the public documentation about the Editorial Working Group for potentially interested contributors both inside and outside of Unicode. We eventually plan to document and chart the internal processes of the committee to help newcomers better understand our work.

D. UTR Topics

FYI: During this cycle, the Editorial Working Group has been lightly reviewing UAXes and UTSes.

E. PRI Topics & Other Feedback

E1. General Feedback

Date/Time: Fri Jul 26 02:41:07 CDT 2024 ReportID: ID20240726024107 Name: Werner Lemberg Report Type: Error Report Opt Subject: NamesList.txt


xxxxxxxxxx
As discussed in the thread starting at

  https://corp.unicode.org/pipermail/unicode/2024-July/010976.html 

it turned out that the two characters

  1D132   MUSICAL SYMBOL QUARTER TONE SHARP
  1D133   MUSICAL SYMBOL QUARTER TONE FLAT

are not accidentals but *pitch modifiers*, to be added to left of an
accidental (or a note without an accidental) and indicating that the pitch
of the given note has to be raised or lowered by a quarter tone,
respectively.  The provided scans in the discussion confirm this usage.

In other words, these two characters should be put into a separate section
`@ Pitch modifiers` or something like that.

Recommendations

Action Item for Ken Whistler, EDC: Update the NamesList.txt with pitch modifiers subhead at 1D132 for Unicode Version 17.0. [Reference: Section E1 of L2/24-225]

Date/Time: Thu Aug 08 09:06:48 CDT 2024 ReportID: ID20240808090648 Name: Lucas Report Type: Error Report Opt Subject: Multiple


xxxxxxxxxx
The Latin Letters D, K, L, N and R as used in Livonian, Old-Prussian,
Latvian and Romanian (all around the Baltic area) are supposed to have a
comma underneath, and not a cedilla. I have not found a single source that
needs these letters with an actual cedilla, other than errors caused by
you, Unicode. According to Wikipedia these letters were mistakenly encoded
with a Cedilla by Unicode in the early nineties, and that Unicode claims
these errors can not be fixed, (even though, in general, the computer world
is all about bugfixing). These letters should not combine with 0327, but
with 0326, as you probably know, since the font used in your charts shows a
proper comma-accent. The Calibri font fonts I designed also use comma
accents.

Your Unicode-bugs are the cause of many fonts actually using cedillas
instead of comma accents. Your bug has also caused the recent DIN 91379
Norm to include sequences for these letters combined with 0326 comma
accent, instead of using the existing Unicodes of the precomposed letters.

If you, for whatever reason, refuse to fix the bugs introduced by your
predecessors, than at least add notes to ALL of these 10 codepoints, in
your charts, that this was a historic mistake, and that the accents should
actually look like free floating comma accents (0326) and not cedillas
(0327). 

1E10 Ḑ LATIN CAPITAL LETTER D WITH CEDILLA (0044 + 0327)
1E11 ḑ LATIN SMALL LETTER D WITH CEDILLA (0064 + 0327)
0136 Ķ LATIN CAPITAL LETTER K WITH CEDILLA (004B + 0327)
0137 ķ LATIN SMALL LETTER K WITH CEDILLA (006B + 0327)
013B Ļ LATIN CAPITAL LETTER L WITH CEDILLA (004C + 0327)
013C ļ LATIN SMALL LETTER L WITH CEDILLA (006C + 0327)
0145 Ņ LATIN CAPITAL LETTER N WITH CEDILLA (004E + 0327)
0146 ņ LATIN SMALL LETTER N WITH CEDILLA (006E + 0327)
0156 Ŗ LATIN CAPITAL LETTER R WITH CEDILLA (0052 + 0327)
0157 ŗ LATIN SMALL LETTER R WITH CEDILLA (0072 + 0327)

ASAP please, thank you.

Recommendations

Action item for Ken Whistler, EDC: Consider adding annotations to NamesList.txt for the case pairs 1E10/1E11, 0136/0137, 013B/013C, 0145/0146, 0156/0157, for example: Despite the name, this pair of characters should normally be displayed with a comma below

Date/Time: Sat Aug 31 22:24:11 CDT 2024 ReportID: ID20240831222411 Name: Guillaume Fortin-Debigaré Report Type: Error Report Opt Subject: Unicode 15.1.0 Core Specifications - Chapter 22 Symbols


xxxxxxxxxx
Table 22-5 "Mathematical Operators Disunified from Punctuation" lists the incorrect 
Unicode code point for the SOLIDUS character in the second row of the left column. 
If should be 002F instead of 003F.

Comments

This error has been fixed in the Unicode 16.0 core spec.

Date/Time: Sat Sep 07 05:14:42 CDT 2024 ReportID: ID20240907051442 Name: Ivan Panchenko Report Type: Error Report Opt Subject: U0000.pdf


xxxxxxxxxx
A minor slip: In U0000.pdf, the following is shown with two right single quotation 
marks (they are not ASCII apostrophes!) instead of a left and a right one:

  for ’Greek question mark’

Comments

This, as well as other examples, has been fixed, and will become visible when the first 17.0 version of NamesList.txt starts to surface.

Date/Time: Wed Sep 11 04:07:14 CDT 2024 ReportID: ID20240911040714 Name: Ivan Panchenko Report Type: Error Report Opt Subject: U2100.pdf


xxxxxxxxxx
There are two issues with the informative aliases “first transfinite
cardinal (countable)”, “second transfinite cardinal(the continuum)”, “third
transfinite cardinal (functions of a real variable)” and “fourth
transfinite cardinal” for the characters U+2135 (ALEF SYMBOL), U+2136
(BET SYMBOL), U+2137 (GIMEL SYMBOL) and U+2138 (DALET SYMBOL),
respectively.

1) Aleph is used together (!) with 0, 1, … as an index to indicate
cardinalities of well-ordered infinite sets (in ascending order).
(Without an index, it is apparently sometimes used for the cardinality of
the continuum, not the first transfinite cardinal!) Beth and gimel are also
used with an index (you can look up the definition), while daleth does not
have an established meaning and was apparently just included in LaTeX so
that it can be used in an ad-hoc manner. (Even if there is someone out
there who uses the characters as the aliases indicate, that would be an
idiosyncrasy that does not deserve mention in the only alias.)

2) That the cardinality of the continuum is the second transfinite cardinal
amounts to the continuum hypothesis, which is known to be independent of
the set theory ZFC, and among those set theorists who have a belief either
way, it seems like most believe it to be false.

Recommendations

Action item for Ken Whistler, EDC: Drop the aliases from characters from U+2135 to U+2138, changing These are left-to-right characters. to These are left-to-right characters. They are used in notations of transfinite cardinals. [Reference: Section E1 of L2/24-225]

Date/Time: Wed Sep 11 05:14:22 CDT 2024 ReportID: ID20240911051422 Name: Ivan Panchenko Report Type: Error Report Opt Subject: (none)


xxxxxxxxxx
Two further remarks:

1) The reference glyph for U+3388 and that for U+3389 have an
italicized “cal” for the calorie. This unit symbol should not be
italicized. While the glyphs are not normative, it would be great if this
could be corrected; an italic mu (in glyphs of the chart) has already been
corrected to an upright one.

2) The character U+2263 (≣ STRICTLY EQUIVALENT TO) is found under the
subhead “Relations”. I think it would be more appropriate to put it
under “Logical operator” (for comparison: U+2227) because it stands for a
connective in modal logic. See here:
https://corp.unicode.org/pipermail/unicode/2022-July/010231.html

Comments

This should be forwarded to the CJK group.
We recommend no action.

Background

https://corp.unicode.org/pipermail/unicode/2022-July/010231.html focuses on the semantics of one particular usage in symbolic logic. Perhaps the submitter’s focus on that is inspired by the logic-sounding character names, but character names for mathematical symbols cannot reflect the breadth of their use, which is « whatever mathematicians feel like ». Cursory searches find https://math.stackexchange.com/a/2788325 or https://www.reddit.com/r/askmath/comments/68i9on/is_there_a_conventional_use_for_the_strictly/ which mention uses interchangeable with other =-family characters. More importantly, in the context of mathematical typesetting, relations and (binary) operators are also typographical categories, affecting e.g., spacing. MathClassEx, revision 15, classifies ≣ as R, like ≡ et al.; and it would be very weird to see ≡ typeset differently from ≣.

Date/Time: Tue Sep 24 06:11:31 CDT 2024 ReportID: ID20240924061131 Name: Ben Harris Report Type: Error Report Opt Subject: The Unicode® Standard Version 16.0 – Core Specification


xxxxxxxxxx
A piece of text has been lost in the translation to HTML for Unicode 16.  In
Unicode 15.1.0, this text appears:

"So for example, the representation of the number 12,346 in the traditional
 system would be by a sequence of CJK ideographs with numeric values as
 follows: <one, ten-thousand, two, thousand, three, hundred, four, ten,
 six>."

That is, the example is "one, ten-thousand, two, thousand, three, hundred,
four, ten, six", surrounded by less-than and greater-than signs.

In Unicode 16.0.0, at
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-22/#G46185,
the same sentence reads:

"So for example, the representation of the number 12,346 in the traditional
 system would be by a sequence of CJK ideographs with numeric values as
 follows: ."

That is, the entire text within and including the less-than and greater-than
signs has vanished.  The HTML source shows that the text does actually
appear in the source, but the less-than sign has not been properly escaped
and so is interpreted as markup by browsers.

This makes me suspect that there may be other similar problems elsewhere in
the standard.  I haven't (yet) made any attempt at looking for them.

Comments

This has been fixed in the 17.0 core specification draft.

Date/Time: Fri Oct 04 11:39:10 CDT 2024 ReportID: ID20241004113910 Name: Malo Report Type: Error Report Opt Subject: The Unicode® Standard Version 16.0 Core Specification


xxxxxxxxxx
Section 24.1.9 of the Unicode® Standard Version 16.0 Core Specification
(https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-24/#G3725)
includes sample character list which contains a mistake: 212B Å ANGSTROM
SIGN is incorrectly marked as having the canonical mapping 00C5 Å angstrom
sign, instead of 00C5 Å latin capital letter a with ring above. Note that
this error is not present in the corresponding chart
(https://www.unicode.org/charts/PDF/U2100.pdf).

Comments

This has been fixed in the 17.0 core specification draft.

Date/Time: Sun Oct 06 16:19:03 CDT 2024 ReportID: ID20241006161903 Name: Jim DeLaHunt Report Type: Error Report Opt Subject: www.unicode.org/versions/latest/


xxxxxxxxxx
Passing on a social media comment about page at
https://www.unicode.org/versions/latest/ . Reader visits the page wanting
to find the Core Spec (can generalise other parts of the Unicode Standard
such as UTRs). Reader expects that the page will contain links to the parts
of the core spec which they seek. Instead, the page describes the
differences between the latest version of TUS and the previous version. I
suggest adding a section to the top of this page, describing "The current
version of The Unicode Standard is 16.0.0. It consists of a Core
Specification (link), some Code Charts (link), etc. Then put the current
content under a heading like "Differences from previous version of the
Standard". 

The present set of links, especially the unnumbered list of links under "B.
Technical Overview", might make the reader hope they link to the parts of
the Standard, but in fact they link to subheadings below which describe
changes. It would be better for the list of links at the top of the page be
to the parts of the latest version of The Unicode Standard, as implied by
the URL.

Original social media post:
https://cosocial.ca/@timbray/113170595870924709 , by Tim Bray of XML fame.
Relayed by Jim DeLaHunt. The explanation above is mine, not Tim's. He may
submit his own Error Report in his own words.

Comments

Note that the tech site home page has already been improved to include a direct link to the latest core spec.
The Editorial Working Group will try to improve the 17.0 landing page, and potentially retrofit the 16.0 one.

Date/Time: Thu Oct 24 10:04:37 CDT 2024 ReportID: ID20241024100437 Name: Sridatta A Report Type: Error Report Opt Subject: Corrections to Unicode chapter of Tulu-Tigalari


xxxxxxxxxx
In chapter 15
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-15/#G71814
“Tulu-Tigalari is a historic script attested in a large number of manuscripts 
from Karnataka and northern Kerala dating to as early as 1300 CE. It was used 
to write Sanskrit, Tulu, and Malayalam, “
Should be corrected to have Kannada instead of Malayalam.
In #Figure 15-5. The glyph is that of ju than chu

Comments

This has been fixed in the 17.0 core specification draft.

G. Miscellaneous Topics

Nothing to report.