L2/20-174

Comments on Public Review Issues
(April 20 - July 20, 2020)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 20, 2020, since the previous cumulative document was issued prior to UTC #163 (April 2020).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of July 20, 2020.

Issue Name Feedback Link
421 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback) No feedback at this time  
420 Proposed Update UAX #45, U-source Ideographs (feedback) No feedback at this time  
419 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time  
417 Proposed Update UAX #29, Unicode Text Segmentation (feedback) No feedback at this time  
416 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time  
415 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time  
408 QID Emoji (feedback) Last feedback June 4, 2020  

The links below go to locations in this document for feedback.

Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to UCD and Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports

 


Feedback routed to Unihan ad hoc for evaluation

Date/Time: Fri Jun 12 08:20:02 CDT 2020
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Unihan Database changes

The following are suggested changes to the Unihan Database, which includes
justifications for doing so:

U+6589 斉, whose current radical is 67 (67.4), is the Japanese simplified
form of U+9F4A 齊 whose radical is 210 (210.0). The PRC simplified form of
U+9F4A 齊, U+9F50 齐, is also assigned Radical #210 (210'.0), along with
Radical 67 (67.2). I propose that 210.0 be added to the existing kRSUnicode
property value of U+6589 斉:

U+6589 kRSUnicode 67.4 210.0

U+6B6F 歯, whose current radical is 77 (77.8), is the Japanese simplified
form of U+9F52 齒 whose radical is 211 (211.0). The PRC simplified form of
U+9F52 齒, U+9F7F 齿, is also assigned Radical #211 (211'.0). I propose that
211.0 be added to the existing kRSUnicode property value of U+6B6F 歯:

U+6B6F kRSUnicode 77.8 211.0

In addition, U+2B81A 𫠚 (Extension D) uses the Japanese simplified form of
U+9F52 齒 as a component, not the PRC simplified form, U+9F7F 齿, so its
kRSUnicode value (211'5) should not include a single quote that indicates a
PRC simplified form of the radical. I propose that the single quote be
removed from the kRSUnicode property value of U+2B81A 𫠚:

U+2B81A kRSUnicode 211.5

U+7ADC 竜, whose current radical is 117 (117.5), is the Japanese simplified
form of U+9F8D 龍 whose radical is 212 (212.0). The PRC simplified form of
U+9F8D 龍, U+9F99 龙, is also assigned Radical #212 (212'.0). I propose that
212.0 be added to the existing kRSUnicode property value of U+7ADC 竜:

U+7ADC kRSUnicode 117.5 212.0

In addition, the following ideographs use U+7ADC 竜 as a component, and I
propose that Radical #212, along with the appropriate number of residual
strokes, be added to their existing kRSUnicode property values (the
characters are shown):

U+21676 𡙶 kRSUnicode 37.11 212.4
U+23BE1 𣯡 kRSUnicode 82.10 212.4
U+2412F 𤄯 kRSUnicode 85.18 212.11
U+25269 𥉩 kRSUnicode 109.10 212.5
U+25A9D 𥪝 kRSUnicode 117.9 212.4
U+25A9E 𥪞 kRSUnicode 117.9 212.4
U+2A95B 𪥛 kRSUnicode 37.10 212.3
U+2AC6F 𪱯 kRSUnicode 74.17 212.11
U+2ADF9 𪷹 kRSUnicode 85.15 212.8
U+2AF5E 𪽞 kRSUnicode 102.10 212.5
U+2AFC1 𪿁 kRSUnicode 109.14 212.9
U+2B3FD 𫏽 kRSUnicode 159.10 212.7
U+2C099 𬂙 kRSUnicode 74.17 212.11
U+2C514 𬔔 kRSUnicode 116.13 212.8
U+2E13F 𮄿 kRSUnicode 117.25 212.20

U+203A4 𠎤, whose current radical is 9 (9.12), is a variant form of U+9FA0 龠
whose radical is 214 (214.0). I propose that 214.-3 be added to the existing
kRSUnicode property value of U+203A4 𠎤:

U+203A4 kRSUnicode 9.12 214.-3

U+2B809 𫠉 and U+2B813 𫠓 (Extension D) are variant forms of U+99AC 馬 and
U+9CE5 鳥, respectively, which have three fewer strokes. I propose that the
residual number of strokes as specified in their kRSUnicode property values
be changed from 0 to -3, and that their kTotalStrokes property values be
corrected to reflect the actual number of strokes, which is three fewer than
their existing kTotalStrokes property values of 10 and 11, respectively:

U+2B809 kRSUnicode 187.-3
U+2B809 kTotalStrokes 7
U+2B813 kRSUnicode 196.-3
U+2B813 kTotalStrokes 8

U+2CF04 𬼄 (Extension F), whose current radical is 4 (4.3), is a related to
U+2CF01 𬼁 (also Extension F), whose radical is also 4 (4.1), but both
ideographs share the same kTotalStrokes property value (2), which is not
possible when considering their stroke composition. In addition, U+2CF04 𬼄
is composed of the following three strokes: U+31D1 ㇑, U+31D6 ㇖, and U+31E1
㇡. This suggests two (2) residual strokes, not three (3). I propose that the
kRSUnicode property value of U+2CF04 𬼄 be changed from 4.3 to 4.2 to match
the number of actual residual strokes, and that its kTotalStrokes property
value be changed from 2 to 3, to match the number of strokes in the radical
(1) plus residual strokes (2):

2CF04 kRSUnicode 4.2
2CF04 kTotalStrokes 3

U+2CF09 𬼉 (Extension F), whose current radical is 4 (4.5), is a variant form
of U+7F36 缶 whose radical is 121 (121.0), and seems to be missing the first
stroke. I propose that 121.-1 be added to the existing kRSUnicode property
value of U+2CF09 𬼉:

U+2CF09 kRSUnicode 4.5 121.-1

That is all.

Date/Time: Mon Jun 15 20:30:55 CDT 2020
Name: Jim Breen
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Proposed Unihan Database additions

I would like to propose the following additions to the Unihan Database for
U+7A3D (稽) and U+25874 (𥡴). The purpose of the additions is to establish the
relationship between them, and to provide Japanese-oriented information for
U+25874 which is currently missing from the Database. I have appended some
notes relating to the proposed additions.

U+7A3D  kZVariant       U+25874<kMorohashi:T
U+7A3D  kSemanticVariant        U+25874<kMorohashi:T
U+25874 kIRGDaiKanwaZiten       25240
U+25874 kMorohashi      25240
U+25874 kNelson 3304
U+25874 kJapaneseKun    TODOMERU KANGAERU
U+25874 kJapaneseOn     KEI
U+25874 kZVariant       U+7A3D<kMorohashi:TZ
U+25874 kSemanticVariant        U+7A3D<kMorohashi:TZ

When I was first studying Japanese in the 1980s about the only kanji
dictionary available to us was the venerable Nelson "Japanese-English
Character Dictionary". One of the kanji in Nelson which has been raised with
me recently is 稽 (no. 3304), which is now one of the 常用漢字 (common use kanji)
taught in Japanese schools. Nelson did not use that glyph for the kanji in
his dictionary; he used the closely-related 𥡴 glyph. This presented a slight
problem when we began to develop electronic versions in the early 1990s, as
𥡴 was not in the main JIS standard (JIS X 0208-1983/1990) [See Note 1
below]. The solution was to use 稽 instead; after all, it is the "correct"
kanji. (Morohashi's 大漢和辞典 has a full entry for 稽 (no. 25218) and an
abbreviated entry for 𥡴 (no. 25240) pointing out it is a variant of 稽.) When
the New Nelson was published in 1997 the editor, John Haig, kept the 𥡴 as
the "correct" glyph (no. 4174) and included 稽 (no. 4163) as "nonstandard for
𥡴". Spahn and Hadamitzky in their 1996 "The Kanji Dictionary" similarly base
their entry on the 𥡴 glyph (index 5d11.3) and list 稽 as an alternative. The
𥡴 form is not currently in any JIS kanji standard, but it is in Unicode
(U+25874). The Unihan data indicates it has been based on Taiwanese sources.
There is currently no reference to Morohashi or any other Japanese source,
and no mention of its association with 稽 (U+7A3D), the usual Japanese
readings (ケイ, かんがえる, とどめる) or the meanings usually associated with it in
Japan.

Note 1. The predecessor to JIS X 0208, JIS C 6226, which was published in
1978, had the 16-stroke 𥡴 glyph in the code-point now occupied by 稽. This
was changed when it was replaced by JIS X 0208-1983.

Feedback routed to Script ad hoc for evaluation

Date/Time: Thu Apr 23 13:30:44 CDT 2020
Name: Markus W Scherer
Report Type: Error Report
Opt Subject: uppercase of U+0587 ARMENIAN SMALL LIGATURE ECH YIWN

Maybe for the Script Ad Hoc?

We have received a bug report claiming that the uppercase form of U+0587 և
is wrong.

SpecialCasing.txt has
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

This means that the ligature small ech-yiwn uppercases to ԵՒ=capital ech+yiwn=0535+0552.

The report says that it should uppercase to ԵՎ=capital ech+vew=0535+054E.

I have asked for an authoritative reference and will report when I receive something.

In the meantime, I found this:

https://en.wikipedia.org/wiki/Armenian_alphabet#endnote_h

“The ligature և has no majuscule form; when capitalized it is written as two
letters Եւ (classical) or Եվ (reformed).”

Can someone confirm this?

If true, should we change SpecialCasing.txt to use the "reformed" uppercasing?
Should implementers (e.g., ICU) offer both versions? Under what conditions?

Please advise.

Date/Time: Tue Jun 16 04:38:17 CDT 2020
Name: Sandra Lippert
Report Type: Feedback on an Encoding Proposal
Opt Subject: capital H with line below etc.

Dear Sirs and Madams,

I hope I chose the correct category for this - I did not find "proposing an encoding". 

I am an Egyptologist, and while I am very glad that in the last years,
almost all of the special glyphs we need for translitterating ancient
Egyptian have been added to Unicode, I am very much puzzled why there is
still no capital H with a line below in Unicode, even though the
corresponding lowercase letter (U+1E96) exists. This was clearly an
oversight, but why did it not get fixed since? It cannot be that no-one ever
pointed it out: in my search for answers, I came upon a discussion thread
from 18 years ago
( https://unicode.unicode.narkive.com/8rfiWRgg/capital-letter-h-with-line-below )
where this problem was already mentioned, but nothing seems to have been
done about it since. There, it was suggested that one combine U+0048 and
U+0331, but this works only in a very limited number of fonts because the
combining macron below is sometimes too large or too narrow for capital H
and is often shifted to one side instead of being centered coreectly.

And while we are at it: the glyphs for capital and lower case h with ^
underneath (necessary for translitterating demotic texts) are also absent
from Unicode, and again, adding a combining circumflex below (U+032D) does
not work in a lot of fonts because it is not centered correctly. Sometimes,
it works in the regular font but "slips off" to one side as soon as one
switches to italics, which is standard for egyptological translitteration.
This is not a very fancy letter either, and its "cousin", Ṱ/ṱ (U+1E70 /
U+1E71), also used in translitterating demotic, is already present, so it
would be very helpful if it was finally encoded as well.

Thank you in advance for considering my request. I am looking forward to
hearing from you,

kind regards,

Sandra Lippert
Directrice de recherche
CNRS, Paris (UMR 8546-AOrOc)

Date/Time: Thu Jul 2 15:46:27 CDT 2020
Name: Kent Karlsson
Report Type: Error Report
Opt Subject: KHMER CONSONANT SIGN COENG DA should look like KHMER LETTER DA, not like KHMER LETTER TA

Regarding:
http://www.unicode.org/versions/Unicode13.0.0/ch16.pdf 
Table 16-8. Khmer Subscript Consonant Signs

This table gives for
 17D2 178A khmer consonant sign coeng da
a glyph that is identical to that of 
 17D2 178F khmer consonant sign coeng ta

Actually, COENG DA did have, and should still have, a (range of) glyph
derived from the (range of) glyph for KHMER CONSONANT DA.

The current "recommendation" (if that is what that table is) leads to that
neither the author nor the reader of a text knows which of the two (COENG DA
or COENG TA) is used in a text, as both looks like COENG TA. Further, one
cannot represent (with that "recommendation") texts that really do have a
COENG DA that looks similar to a DA. COENG DA really did have its own glyph
based on the glyph for DA. Having a separate (preferably DA-shape based)
glyph for COENG DA will both make it possible for authors and readers to see
(without checking the character code) whether a COENG DA or a COENG TA is
used, and also makes historical as well as modern spelling using COENG DA
possible.

(Introducing a "KHMER ARCHAIC COENG DA" or similar, which has been floated
as a possibility, is not a good idea. It does not solve the first problem,
and would be a strange and unnecessary "solution" to the second problem.)

I got two references from Richard Wordingham, both showing a "DA-shaped" COENG DA:

* http://aefek.free.fr/iso_album/antelme_bis.pdf (pp25 and 26)
* http://www.khmerfonts.info/fontinfo.php?font=1507  

So the use of a "COENG TA"-glyph where one used to use "COENG DA" should be
seen as a spell change, not a "glyph merger" or whatever.

Changing (correcting) fonts to use a "DA"-like glyph for "COENG DA" may
reveal some (in modern view) spell errors, but that is as it should be.

Conclusion: in table 16-8, change the glyph in the line for
 17D2 178A khmer consonant sign coeng da
to a subscript glyph based on the glyph for KHMER LETTER DA.

Feedback routed to UCD and Algorithms ad hoc for evaluation

Date/Time: Fri Apr 24 17:59:22 CDT 2020
Contact: fantasai@inkedblade.net
Name: Elika J. Etemad
Report Type: Error Report
Opt Subject: UTR50 orientation of Bopomofo tone marks

Hello UTC,

I'm writing regarding the four tone marks used in bopomofo:

 ‎02C9 MODIFIER LETTER MACRON
 ‎02CA MODIFIER LETTER ACUTE ACCENT
 ‎02C7 CARON
 ‎02CB MODIFIER LETTER GRAVE ACCENT
 ‎02D9 DOT ABOVE

These are currently registered as R in UTR50, but they should probably 
be adjusted to U, consistent with the rest of the Bopomofo letters. 
(They're a bit more widely used than just within Bopomofo, but UTR50 
is primarily targetted at CJK context, and within this context these 
modifier letters are much more likely to be used as Bopomofo tone 
marks than otherwise.)

See discussion thread at  https://lists.w3.org/Archives/Public/www-style/2015Aug/0315.html  
for more context.

Thanks~
~fantasai

Date/Time: Tue May 12 20:46:39 CDT 2020
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: IdentifierType of Ainu Katakana characters

In IdentifierStatus.txt:

31F0..31FF    ; Technical      # 3.2   [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO

These are from the Katakana Phonetic Extensions block; which exists for writing the Ainu language. 
Ainu is apparently both written using the Latin and Katakana scripts, using these extensions.

According to UTS 39 Table 1[1], "Technical" is "Specialized usage: technical, liturgical, etc.", 
which doesn't seem to fit with code points that are actively used in a primary script for a language.

Should we be changing this to Recommended?

 [1]: https://www.unicode.org/reports/tr39/#Identifier_Status_and_Type 

Date/Time: Wed May 20 01:31:04 CDT 2020
Name: Trevor
Report Type: Error Report
Opt Subject: IDNA test case error

Hello,

I believe I have found 2 tests in
https://www.unicode.org/Public/idna/13.0.0/IdnaTestV2.txt whose expected
result are not possible to represent when using the ToASCII operation with
Transitional_Processing = true, CheckJoiners = false, and VerifyDnsLength =
false.

This relates to tests whose source string is U+200C or U+200D. The U+200C
and U+200D get mapped to an empty string due to the use of Transitional
Processing and as a result, the expected ouput is an empty string. However,
it is not possible to represent an empty string as the expected output for
toAsciiT because an empty string means that toAsciiT "adopts" toAsciiN's
value, which in this case is either 'xn--1ug' or 'xn--0ug'.

Tests in question (source string escaped for readability):
\u200D; ; [C2]; xn--1ug; ; ; [A4_2] #
\u200C; ; [C1]; xn--0ug; ; ; [A4_2] #

Date/Time: Tue Jun 2 06:23:29 CDT 2020
Name: Bahman Eslami
Report Type: Error Report
Opt Subject: ARABIC DATE SEPARATOR class error

Hello,

The error is that the charachter ARABIC DATE SEPARATOR is classified as 
Bidi Category "AL" which would imply strong right-to-left direction. 
This makes it's impossible to apply kerning between Arabic script numbers 
and ADS. Please take a look at the following issue on github:

https://github.com/googlefonts/ufo2ft/issues/384 

I think bi-directional type of the ADS should be LTR or Neutral.

Thanks,
Bahman

Date/Time: Sat May 30 19:34:01 CDT 2020
Name: Elika J. Etemad
Report Type: Error Report
Opt Subject: Vertical Text in UAX9 Mostly Irrelevant

The rules in UAX9 6.2 Vertical Text http://unicode.org/reports/tr9/#Vertical_Text 
are presented as if this is what implementations are expected to do, but actually, 
most of them don't. RTL text is rendered bottom-to-top instead. The section should 
be removed, or rewritten to be an example of something that *could* be done with 
UAX9's algorithms (but isn't necessarily).

Date/Time: Fri May 29 16:49:03 CDT 2020
Name: Trevor
Report Type: Error Report
Opt Subject: UTS#46 tests and URL delemiters

Hello,

There are a number of tests[1] that contain labels that have a U+003F "?"
question mark code point where the test expects the label containing the
U+003F "?" question mark to remain in its Unicode form when performing the
toASCII[2] operation on the domain. As far as I can tell, there is nothing
in the UTS#46 specification that prevents the label from being converted
into an ASCII label. The toASCII[2] operation converts all labels to ASCII
unless punycode returns an error. Going through the Punycode spec,
Punycode's encode[3] algorithm does not reject U+003F "?" question marks and
as a result labels containing U+003F "?" question marks get converted to
ASCII contrary to the test expectations.

I presume that that tests are trying to say that any label containing common
URL delimiters such as ":/@.?#[]" shouldn't be converted to an ASCII label,
but I'm not really sure what the expected results are supposed to be. I
suppose you could add a check for such common URL delimiters and skip
punycode encoding labels that contain one assuming the test expectations are
correct.

[1] https://www.unicode.org/Public/idna/13.0.0/IdnaTestV2.txt 
[2] https://www.unicode.org/reports/tr46/#ToASCII 
[3] https://tools.ietf.org/html/rfc3492#section-6.3 

- Trevor

Date/Time: Mon Jun 22 16:19:50 CDT 2020
Name: fantasai
Report Type: Error Report
Opt Subject: UAX14 quotation marks vs ID

The UAX14 rules concerning QU are too strict, and don't work for Chinese by 
default, because they rely on spaces to be a reasonable default. This can 
probably be solved by allowing breaks between ID + Pi and between Pf + ID. 
See https://github.com/w3c/clreq/issues/245 for more info.

Date/Time: Fri Jun 26 19:06:23 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing Indic shaping properties for U+0300 and U+0301

The Unicode Standard 13.0, page 466, recommends the characters U+0300
combining grave accent and U+0301 combining acute accent for use with the
Devanagari script. However, these characters do not have Indic syllabic
categories defined for them, so it’s not clear how they would be used and
where they would fit into Devanagari syllables.

Date/Time: Tue Jul 7 17:43:56 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing Indic shaping properties for Devanagari and Vedic characters

A number of Devanagari and Vedic characters are missing Indic syllabic or
positional category definitions in the Unicode 13.0 data:

– 0950, 0971, A8F4..A8F7, A8FB, A8FD don’t have an Indic syllabic category 
(as letters they don’t need a positional category).

– 1CE2..1CE8, 1CED don’t have syllabic categories (they do have positional 
categories).

– 1CF8..1CF9 don’t have a positional category (they do have a syllabic category).

For the first set, I can imagine that some of the characters don’t
participate in forming Devanagari syllables, and therefore the default value
Other for the syllabic category is actually correct. If that’s the case,
however, I think it would be preferable to explicitly provide the value,
both to make clear that it’s intentional and to remind users of the data
that this value can occur with Brahmic scripts (the specification of the
Universal Shaping Engine currently does not handle this case).

Feedback routed to Emoji SC for evaluation

Date/Time: Wed May 20 13:49:29 CDT 2020
Name: Yitz Gale
Report Type: Error Report
Opt Subject: Emoji - multiple skin tones for handshake

In the current EMOJI standard, version 13.0, section 2.6 Multi-Person
Groupings explicitly mentions U+1F91D HANDSHAKE as an emoji that depicts
more than one person interacting and could be implemented with a choice of
skin tones. However, in section 2.6.2 Multi-Person Skin Tones, there is no
mention of how to specify two different skin tones for U+1F91D HANDSHAKE. It
is not clear at all how to do that. As a result, vendors have not
implemented this in their Emoji sets.

In my opinion, this particular combination - multiple skin tones in a
handshake - is especially important to be included, because it would enable
people to express naturally, in the course of conversations, feelings of
inclusiveness and peace among diverse groups.

Below are a few suggestion of how we might specify multiple skin tones in a
handshake. I don't find any of them particularly satisfying. You might pick
one of these, or perhaps do something else. But please, do standardize a way
to represent this, mention it explicitly in the standard, and encourage
vendors to include it in their Emoji sets. Thanks!

1F91D 1F3FB 200D 1F91D 1F3FD 
HANDSHAKE, LIGHT SKIN TONE, ZWJ, HANDSHAKE, MEDIUM SKIN TONE

270B 1F3FB 200D 1F91D 200D 270B 1F3FD 
HAND, LIGHT SKIN TONE, ZWJ, HANDSHAKE, ZWJ, HAND, MEDIUM SKIN TONE

270B 1F3FB 200D 270B 1F3FD 
HAND, LIGHT SKIN TONE, ZWJ, HAND, MEDIUM SKIN TONE

Date/Time: Wed May 20 14:18:10 CDT 2020
Name: Brian Hendery
Report Type: Other Question, Problem, or Feedback
Opt Subject: Handshake emoji

Hey there,

In section 2.6 Multi-Person Groupings, it mentions that HANDSHAKE depicts
multiple persons and so should allow multiple skin tones. But in section
2.6.2 Multi-Person Skin Tones, there are no instructions how to do that for
HANDSHAKE. Hoping you can take a look at this!

Cheers,
Brian

Feedback routed to Editorial Committee for evaluation

Date/Time: Tue May 5 07:45:51 CDT 2020
Name: David Corbett
Report Type: Error Report
Opt Subject: Leading zeros in code point labels

Section 4.8 of TUS says “code point labels are constructed by using a 
lowercase prefix derived from the code point type, followed by a hyphen-minus 
and then a 4- to 6-digit hexadecimal representation of the code point.” 
The convention is obviously to use as few leading zeros as possible, but 
is that required by definition? For example, could control-0009 be referred 
to as control-000009? It is important to clarify this because code point 
labels are part of the character name namespace.

Date/Time: Fri May 29 20:42:13 CDT 2020
Name: Yoshidumi
Report Type: Error Report
Opt Subject: Simple Typo in UTR-25 Document

Hello. I found a simple typo in UTR-25 document <https://www.unicode.org/reports/tr25/>.
In MathML example on page 36 of it, <mover> element’s 
right tag is written as “)”, but it should be “>”.
Sorry for bothering you by detailed points...

Date/Time: Thu Jun 25 22:04:45 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing definitions for nukta, bindu, svara

Section 12.1 Devanagari of the Unicode Standard 13.0, page 460, refers to
the character types nukta, bindus, and svaras. Neither of these terms is
defined on this page, any previous page in the section, the pages referenced
for them in the Standard’s index, or in the Unicode Glossary.

“Nukta” here presumably means the one Devanagari character whose name
includes “nukta”.

“Bindu” is later, on page 466, explained as “One class of these marks, known
as bindus, is represented by U+0901 devanagari sign candrabindu and U+0902
devanagari sign anusvara.” That seems to be an incomplete definition, as
the Unicode data file IndicSyllabicCategory.txt identifies three additional
Devanagari bindu characters.

“Svara” is mentioned in the Unicode Glossary as a synonym for “vowel”, and
in IndicSyllabicCategory.txt in the context of cantillation marks. The
mention in the glossary doesn’t fit the usage on page 460, and it’s not
clear whether the cantillation marks are meant, and whether there are other
svara characters.

Date/Time: Wed Jul 8 13:55:33 CDT 2020
Name: Dirkjan Ochtman
Report Type: Error Report
Opt Subject: UTS #46 (rev 25): incorrect TLD in example

Hi there,

In UTS #46, rev 25, section 4.5, the third row appears as if "xn--blo-7ka.de" 
should be converted to "bloß.com". I guess the latter value should read as "bloß.de" 
instead (or the encoded value should be changed).

Kind regards,

Dirkjan Ochtman

Date/Time: Wed Jul 8 17:29:48 CDT 2020
Name: Stanislaw Goldstein
Report Type: Error Report
Opt Subject: Sections 2.8 and 2.9 not upgraded

The fact that a few thousand characters were added to plane 3 (CJK Extension
G) was overlooked in sections on Unicode allocation (sections 2.8 and 2.9):

1) Tertiary Ideographic Plane should be mentioned on page 44 of the
Standard, just before the paragraph on Supplementary Special-purpose Plane.

2) Figure 2.13 should be changed to show a part of plane 3 in dark grey.

3) Plane 3 should be added on page 51, below Plane 2; otherwise, the
statement "All other planes are reserved; there are no characters assigned
in them." at the beginning of the last paragraph on page 51 is wrong.

I have not checked the rest of the Standard on the information regarding
TIP, there may be other errors of this type there.

Date/Time: Fri Jul 10 11:01:15 CDT 2020
Name: Paul Hardy
Report Type: Error Report
Opt Subject: 0x28 and 0x29 Flipped in APL-ISO-IR-68.TXT

Note: This report has been fully resolved by the Editorial Committee, and an updated data file has been posted.

Greetings,

I suspect that the ISO-IR-68 code points 0x28 and 0x29 have names that are
flipped in the file
https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/APL-ISO-IR-68.TXT. 
That file names 0x28 as LOGICAL AND and 0x29 as LOGICAL OR.

The ISO-IR-68 standard file (see
https://www.itscj.ipsj.or.jp/iso-ir/068.pdf) names 0x28 as DOWN CARET and
0x29 as UP CARET, which correspond to logical or and logical and,
respectively.

As correlation, see the alternate APL character set for the DECwriter II and
DECwriter LA120 described below.

The DECwriter LA120
(http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/terminal/la120/EK-LA120-UG-003_LA120_Users_Guide_Jun79.pdf,
p. 83) shows the DOWN CARET at octal code point 050 (which is hexadecimal
0x28), and the UP CARET at octal code point 051 (which is hexadecimal 0x29).

Likewise, the DECwriter II manual shows the same ordering (see
http://www.bitsavers.org/www.computer.museum.uq.edu.au/pdf/EK-LA3635-OP-002%20LA35%20&%2036%20DECwriter%20II%20User's%20Manual.pdf,
p. 1-16, Table b).

Further correlation appears in Kenneth Iverson's _A Programming Language_,
John Wiley & Sons, New York: 1962, Library of Congress Catalog Card
Number 62-15180.  Section 1.4 (p. 11) describes the DOWN CARET as "or" and
the UP CARET as "and".  This interpretation is repeated in the "Summary of
Notation" appendix in section S.4 "Elementary Operations", p. 267.  I can
send pictures of those pages if you would like.

This is still further correlated by Unicode code points U+2227, LOGICAL AND
(showing a UP CARET-type glyph), U+2228, LOGICAL OR (showing a DOWN
CARET-type glyph).

The error from APL-ISO-IR-68.TXT appears to have also propagated to
Wikipedia; see the table in the "Character set" section at
https://en.wikipedia.org/wiki/ISO-IR-68.

Thank you,

Paul Hardy


Other Reports

(None at this time.)