Comments on Public Review Issues
(August 3, 2016 - November 7, 2016)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of May 5, 2016, since the previous cumulative document was issued prior to UTC #148 (August 2016). Grayed-out items in the Table of Contents do not have feedback here.


The links below go directly to open PRIs and to feedback documents for them, as of November 7, 2016.

Issue Name Feedback Link
336 Proposed Update UAX #41, Common References for Unicode Standard Annexes (feedback) no feedback to date
335 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback)
334 Proposed Update UTS #39, Unicode Security Mechanisms (feedback)
333 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax (feedback)
332 Proposed Update UTS #10, Unicode Collation Algorithm (feedback)
331 Proposed Update UTR #50, Unicode Vertical Text Layout (feedback) no feedback to date
330 Proposed Update UTR #51, Unicode Emoji (feedback)
329 Proposed Update UAX #44, Unicode Character Database (feedback)

The links below go to locations in this document for feedback.

Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports

Note: The section of Feedback on Encoding Proposals this time includes:
L2/26-226  L2/26-234  L2/16-273  L2/16-275  L2/16-280  L2/16-282  L2/16-294  L2/16-295  L2/16-308  L2/16-316  L2/16-318  L2/16-320  L2/16-357 


Feedback to UTC / Encoding Proposals

From: "Tana and Patrick McMullen"
Subject: Feedback on Encoding proposal L2/16-280L2/16-282
Opt Subject: New emoji for Breastfeeding
Date: Fri, 21 Oct 2016 14:09:11 +1100


I have seen the new emoji being proposed for breastfeeding. I agree that it is
a wonderful idea, as currently the emojis relating to babies and feeding/milk
are related to bottle feeding, which is not the dominant method of feeding
across the world and does nothing to normalise breastfeeding in western

However, two people I have shown the proposed emoji to have not been able to
immediately work out what they were seeing (and needed to have it explained to
them) - possibly due to the mother's head not being present.  As breastfeeding
is more to do with the relationship between the mother and the baby, it seems
inappropriate to omit the mother's head - plus it makes it difficult to
rapidly recognise the meaning of the symbol.  I feel that the emoji needs

The logo of the Australian Breastfeeding Association is immediately
recognisable as a mother breastfeeding a baby, as is the logo of La Leche
League.  These logos are under copyright, I imagine, but maybe they could be
altered somewhat?

Kind regards, 

Tana McMullen,

Date/Time: Fri Oct 21 18:49:19 CDT 2016
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/26-226 Reactivate UTS 52 mechanism in reduced form

http://www.unicode.org/L2/L2016/16226-reactivate-uts52.pdf presents a modified 
approach for character sequences for flags of subdivisions:

<flag> Tag-F <Tag-lowercase subdivision code> "❓"

I propose dropping the Tag-F character. It is redundant with the base flag 
character. In other words:

<flag> <Tag-lowercase subdivision code> "❓"

There should not be any need for flags to have further attributes (key-value pairs) 
such as hair style. If the syntax does need to be extended, then further characters 
could be added between the subdivision code and the terminator.

The fuller list-of-key-value approach could be used with other base characters 
as appropriate.

Date/Time: Tue Nov 1 06:58:20 CDT 2016
Name: William Overington
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/16-318 Proposal to encode ten color swatch emoji characters

I like the L2/16-318 Proposal to encode ten color swatch emoji characters.

As part of my research on communication through the language barrier I have
encoded localizable sentences that express colour. Each has its own glyph.

Recently I used the glyphs for fifteen pre-set colours in the following


The set of fifteen colours includes all of the ten colours mentioned in the
L2/16-318 document though the name magenta is used where the L2/16-318
document has purple.

The five other pre-set colours that I use are cyan, pink, dark grey, light
grey and sky blue.

The fifteen glyphs are much wider than emoji glyphs. The glyphs for
localizable sentences each have a precisely defined meaning.

Yet the designs of these fifteen glyphs each have the same shape in the left
side two-thirds of the glyph, so the part that specifies which particular
colour is being used could be used as the design for an abstract emoji
character that represents a colour used as in the L2/16-318 document.

Now it may be that if encoded the items proposed in L2/16-318 may have no
glyphs, or they could all be one shape and the shape be in colour. Yet maybe
there could be abstract glyphs that could be displayable either in monochrome
or in colour. The glyph that represents a particular colour would not be
displayed when a ZWJ sequence is acted upon and the colour is thereby used
within a displayed glyph, yet glyphs could be very useful for a graceful
fallback display in colour or monochrome if the ZWJ sequence is not acted upon
in a particular system, and glyphs could be very useful in composing messages
and in analysing character sequences.

Recently I designed and published some designs for abstract emoji.


Whether abstract emoji should be allowed, or even encouraged, is a wider
discussion than the discussion about using abstract glyphs for representing

Using line designs to give an indication of an intended colour within a glyph
is a long-established technique. So using such line designs on their own
without a picture of a physical object, although an innovative step, is not as
big an innovative step as would be the allowing of abstract glyphs in general
where new original designs of shapes are used and the explanation of the
meaning of a glyph has no prior usage. Yet maybe such abstract glyphs would be
desirable for expressing abstract concepts through the language barrier.

As well as localizable sentences for the fifteen pre-set colours I have also
encoded localizable sentences that can be used followed by numbers for
expressing colours precisely using RGB, RGBA and CMYK colour models as

The glyphs for all of those sentences are displayed on pages 3 and 4 of the
following document.


William Overington

Tuesday 1 November 2016

Date/Time: Thu Nov 3 11:49:59 CDT 2016
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/16-308 More Hand Gestures by Peter Edberg, ESC

In parts of Europe we bend the pinky, not the thumb, for 4, because it comes
naturally after the gesture for 3 that is documented in the proposal (and
taken from L2/16-071). If hand signs were encoded just or primarily because of
their use as numbers, it should be evaluated whether setting the Numeric Type
and Value properties appropriately made some sense. I know 0 to be shown as
either something similar to Okay 👌 (also Hole, Anus) or as a closed fist ✊,
but there’re certainly other conventions as well.

Thumbs Up 👍 (or Thumbs Down 👎) rotated by a quarter turn would also be used
for neutral or pending judgement and as the hitchhiker symbol. It may make
sense to provide systematic means to explicitly select the left or right hand
as seen by the viewer.

The references to 🇺🇸ASL are inappropriate and document an overall cultural
bias to emojis. Its manual alphabet is almost equal to the international one.
The ESC or UTC may want to seek out more information from someone more
experienced, e.g. the designer behind <http://fingeralphabet.org>.

The use of a hand gesture in a manual spelling alphabet *alone*, probably
doesn’t constitute enough of a reason to encode a new character, but if an
emoji happens to be appropriate for a letter, that should be well documented.
Currently, for instance, the Raised Fist ✊ could be drawn as either of the
frequent letters A, E and S on the font designer’s discretion.

If the proposed Hand with Thumb and Index Finger Extended was turned by a
quarter, it would resemble the popular finger gun. I’m not sure whether it
should be encoded *separately*. Likewise, the proposed Hand Sign Fingers All
Together if rotated could be used in certain variants of the game Roshambo as
a lizard or bird. The third component of the ILY ligature, raised pinky, is
not being proposed, but would resemble a slug to complete the set needed for a
Japanese hand game, together with frog 👍 and snake☝️/👆. (The German city of
Aachen would also welcome this one, because a raised pinky is known as the
“Klenkes” greeting there.) The basic game has been adapted and extended in
many ways, see e.g. , which shows some original
inventions but also some other popular gestures.

Other random gestures that are missing from Unicode and this proposal in its
current state are …

– Giving/Receiving Hand, i.e. palm facing up (also used for shared folders on some OS),
– Blessing Hand with the palm facing down, although ✋ could also be used for this,
– well established sexual gestures Pleaser (✌️ with index and middle finger hold 
closely together, U in ASL) and Shocker (same but with pinky raised additionally),
– (Military) Salute, which was probably preferably encoded as a Person emoji,
– Fist Pressing Thumb, a gesture of wishing someone good luck in parts of Europe,
also similar to ASL letters T, N, M and A (not E or S), but usually with the fist
oriented as in 👍.

All of these seem more relevant to subjective me than the West-W and East-E hand signs.

The proposed Islamic Prayer Hands as pictured look an awful lot like 🙌 or 👐 as usually implemented (plus 📿).

Date/Time: Fri Nov 4 08:24:02 CDT 2016
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/16-295 Animals Proposal by Craig Cummings / ESC

The ESC on behalf of the UTC needs to come up with clear rules for the
encoding of further animal emojis, answering the following questions:

– When should an animal glyph show the head (`Face`) instead of the full 
body and when shall there be separate code points assigned for both (e.g. 🐶/🐕)?

– Should dimorphic animals (by sex or age) be encoded as separate characters or
should they also use emoji ZWJ sequences, like most human emojis do? Some
already are encoded separately (🐂/🐄, 🐓/🐔/🐤/🐥/🐣/🥚, 🦋/🐛 Bug ~
caterpillar), others have gender-specific glyphs (mostly male) in many
implementations (🦃/🦆/🦌/🦍/🦁/🐗), yet others have a generic and only one
specific character (e.g. 🐏/🐑, no ewe or lamb).

It may be beneficial to use character sequences for toggling face-only and 
full-body glyphs, which may require at least one new (combining) character.

Date/Time: Fri Nov 4 13:07:45 CDT 2016
Name: William Overington
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/16-320 Process for Emoji ZWJ Sequence Proposals

1. In L2/16-320 and in various earlier documents on Emoji ZWJ Sequences by
other authors, there has been use of the word vendor.

The word vendor implies selling. I opine that it would be better to use the
word publisher rather than vendor in any official procedures that the Unicode
Technical Committee produces. The word publisher could refer to a business
that is selling a font yet it could also refer to an individual who is making
a font available to the public at no charge by publishing the font on a web
page. Also to a university that is publishing fonts.

2. There has been mention that encoding a new emoji using an Emoji ZWJ
Sequence can help bring forth new emoji quickly.

Would that process mean that synchronization between Unicode and ISO/IEC 10646
would no longer exist?

Consider please what happens if at a future date when there may be many Emoji
ZWJ Sequences encoded and an end user views a display of a glyph on a printed
document and seeks to find out the encoding of that glyph by looking in
ISO/IEC 10646.

3. Is there a limit on the number of characters in the Emoji ZWJ Sequence.

The existing examples seem to have just two emoji linked with a ZWJ character.

I opine that if Emoji ZWJ Sequences are to be encoded, then longer sequences
should be allowed so as to allow creativity to be expressed.

For example, could one have an Emoji ZWJ Sequence for the following, bringing
in the colour orange from L2/16-318.


4. There seems to be uncertainty as to whether an Emoji ZWJ Sequence is
expected to be in use before being considered by the Unicode Technical
Committee. This seems to be different from the encoding practice used for
characters where use is only after encoding has already taken place.

5. Is the matter of whether a font is available to the public assessed on
whether the font is available as such rather than only being available when
bundled in with the purchase of some other product such as an operating system
or a desktop publishing package or a mobile telephone.

6. Could there be sequences using U+FFFC OBJECT REPLACEMENT CHARACTER with one
or more pairs of ZWJ and a digit please so as to provide anchor points for a
number of images attached to a plain text message so that a document with
images could be constructed.

7. Could an abstract symbol be used in an Emoji ZWJ Sequence?

William Overington

Friday 4 November 2016

Date/Time: Sat Nov 5 14:21:17 CDT 2016
Name: Manvir Singh
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/16-294 (Gurmukhi)

The author of L2/16-294 wants 0A75 YAKASH to be changed as shown in his
proposal. The author of L2/16-302 explains why replace VIRAMA + YA with YAKASH
shouldn't be done, but agrees that the changing of the glyph seems

However, the following points must be considered:
- The Gurmukhi script was established mainly for use in Sikh Scriptures.
- For Sikhs, the preservation of their scripture in their original form is essential.
- In the original form of these scriptures, the YAKASH is resembled much better in it's 
current form in Unicode than it is by the proposal author's version, which doesn't resemble 
the YAKASH in the original texts at all (pictures of various examples can be provided upon 
- Changing the YAKASH to the author's proposed version will push Sikhs away from using 
Unicode for their scriptures. In fact, they are already constrained to using ASCII fonts 
for their scriptures due to some other issues with Unicode, changes for which will be 
addressed in the near future.

Because of this, the proposal to change YAKASH may not be the best idea.

The author of L2/16-302 suggested not replacing VIRAMA + YA with YAKASH and he does have 
a good reason, given that a "huge amount of data that has probably already been generated 
by using VIRAMA + YA for the post-base YA". However, being someone who is pretty invested 
in the use of Unicode for Gurmukhi, I have not seen that much use so far for post-base YA. 
In fact, I have actually seen more people complain that VIRAMA + YA doesn't give YAKASH 
and that post-base YA should be a separate character.
Please refer to the original YAKASH proposal: http://www.unicode.org/L2/L2006/06037-yakash.pdf
Firstly, the author makes it clear on the use of YAKASH in Sikh Scriptures. This is the main 
reason why YAKASH should not be changed.

Also, at the end, the author mentions that "GURMUKHI SIGN YAKASH should be treated as a 
subjoined form of /ya/"

I agree with this, because in the context of subjoined characters, YAKASH works better as a 
subjoined YA than a half-YA does (it also makes more sense visually). The original proposal 
for YAKASH even calls it a "Pairin Yayya" which means subjoined Yayya (YA).

However, though it makes makes more sense that VIRAMA + YA = YAKASH, I'm not too sure on 
if changing it would be ideal. More discussion on this is needed. I would also like to hear 
more thoughts of the author of L2/16-302 as well.

The main concern of this feedback is not VIRAMA + YA equaling YAKASH (even though I have 
provided some thoughts on that as well). The main concern here is that the character for 
YAKASH should not be changed as proposed in L2/16-294, as it is counter productive to how 
Unicode "enables people around the world to use computers in any language". 

Date/Time: Mon Nov 7 11:10:17 CST 2016
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/16-318 Ten Color Swatch Emojis by Paul D. Hunt

I just want to remind everyone that there are already some alternative characters 
to encode colors. The obvious hearts are already mentioned in the proposal. The 
heraldic patterns are probably the easiest to overlook.


Color  | Unicode | Sample
White  | U+2661  | ♡️
Black  | U+2665/1F5A4 | ♥️/🖤
Red    | U+2764  | ❤️
Yellow | U+1F49B | 💛
Green  | U+1F49A | 💚
Blue   | U+1F499 | 💙
Purple | U+1F49C | 💜
Orange | L2/16-124 | N/A
Pink   | U+1F49F/D/8/3/7  | 💟/💝/💘/💓/💗
Broken | U+1F494 | 💔
Sparks | U+1F496 | 💖

Circles and Diamonds

Color  | Unicode | Sample
White  | U+26AA  | ⚪️
Black  | U+26AB  | ⚫️
Red    | U+1F534 | 🔴
Blue   | U+1F535 | 🔵
Blue   | U+1F537 | 🔷
Orange | U+1F536 | 🔶


Color      | Unicode | Sample
Red/Pink   | U+1F4D5 | 📕
Green      | U+1F4D7 | 📗
Blue       | U+1F4D8 | 📘
Orange     | U+1F4D9 | 📙
Yellow     | U+1F4D2 | 📒
Monochrome | U+1F4D3 | 📓
Decorative | U+1F4D4 | 📔
Multiple   | U+1F4DA | 📚

Heraldic Color Patterns “Tincture”

Color            | Unicode | Sample   | Hatching
Black            | U+25A0  | ■ | Solid
White (argent)   | U+25A1  | □ | None, blank
Blue (azure)     | U+25A4  | ▤ | Horizontal stripes
Red (gules)      | U+25A5  | ▥ | Vertical stripes
Black (sable)    | U+25A6  | ▦ | Square pattern
Green (vert)     | U+25A7  | ▧ | Forward diagonals
Purple (purpure) | U+25A8  | ▨ | Backward diagonals
Maroon (sanguine)| U+25A9  | ▩ | Diamond pattern
Brown (tenné)    | U+?     | ?        | Vertical-forward crosshatch
Light Shade      | U+2591  | ░ | 
Medium Shade (or)| U+2592  | ▒ | Dotted
Dark Shade       | U+2593  | ▓ | 

Date/Time: Mon Nov 7 12:38:54 CST 2016
Name: William Overington
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/26-226 Reactivate UTS 52 mechanism in reduced form (v2) (revised)

Feedback on L2/16-226 Reactivate UTS 52 mechanism in reduced form (v2) (revised)

This morning I realized that L2/16-226 had been revised one week ago, the
revised version changing totally the position regarding Private Use Emoji Tag
Sequences that was in the original version.

I write to ask that the Unicode Technical Committee please allow Private Use
Emoji Tag Sequences. I had read the original version of L2/16-226 when it was
added to the Unicode Technical Committee Document Register and I had noted
that it stated as follows.


Private Use - no changes

end quote

I was pleased with that situation.

The revised version has the following.


Note to committee: We had had Private Use defined in tr51 as follows. Adding
it would allow some level of experimentation, such as in L2/16-105. However, a
possible danger is a profusion of forms, with some that people feel obliged to
support because of their frequency; to avoid that it might be better to not
add private use emoji tag sequences.

end quote

Yes, adding the facility would allow some level of experimentation.

In fact, I have already experimented using the facility on Saturday 6 February 2016.


I used U+1F58B LOWER LEFT FOUNTAIN PEN as the base character in the Private
Use Emoji Tag Sequence.

The advantage of allowing Private Use Emoji Tag Sequences is that an
experimental implementation could be converted to a regular Unicode
implementation by changing one character, keeping most of any system that had
been developed unchanged.

Allowing Private Use Emoji Tag Sequences will allow people to experiment while
being in complete conformance with the Unicode Standard.

Please allow Private Use Emoji Tag Sequences.

William Overington

Monday 7 November 2016

Date/Time: Mon Nov 7 15:07:34 CST 2016
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/16-357 Baseball Cap Emoji Proposal

While I favor adding this emoji, I do *not* favor making it red, as that would 
associate the emoji very openly with partisan politics.  I propose using grey 
instead, which happens to be (if we must have a reference to current events) 
the color of the present World's Series champions.  
See http://www.mlbshop.com/Chicago_Cubs_Caps (there are other colors shown 
there, but grey is at the top).

Date/Time: Mon Nov 7 15:48:48 CST 2016
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/26-234 TERIS by Mark Davis

>> The fallback behavior on older systems will be to the appearance of a single RI character.

I wonder whether it would be preferable to have the fallback be the full
parent RI sequence, but not the flag associated with it. With uppercase Latin
letters representing U+1F1E6–FF Regional Indicator Symbol Letters A–Z and
lowercase Latin letters (or international decimal digits) representing U+E00xy
Tag Characters, all of `GsctB`, `Gsct✦B` and `GsctB✦` (but not `GBsct✦`) would
yield a representation different from `GB`, because whereas the latter would
be rendered as an emoji flag 🇬🇧, the former would often show as two emoji
letters 🇬​🇧, assuming the Tag characters will be suppressed only visually.
Only `GsctB✦` would inhibit unwanted combinations in chains, e.g. `UtxSGsctB`
could be rendered as the flag of Singapore surrounded by letters U and B, 🇺
🇸🇬​🇧, whereas `UtxS✦GsctB✦` should not.

Also, since the Waving White Flag 🏳 U+1F3F3 is associated with surrender, it
may be preferable to use the Waving Black Flag 🏴 U+1F3F4 or Black Flag ⚑
U+2691 (which have some unwanted semantics of their own).

By the way: Like any proposal based upon ISO 3166, this can only satisfy a
subset of the most frequent and popular flag requests. In particular, this
would be much welcome by local patriots in the countries of the UK and the
states of the US, whereas the coded subdivisions in other regions don’t match
cultural identity. It also works for some sub-national regions striving for
(re)gaining independence, e.g. Catalonia in Spain and Tibet in China, but not
so much for those spanning multiple countries, e.g. the Kurdish, Assyrian or
Basque areas, or multiple subregions, e.g. Confederate States of America or
South Vietnam.

Date/Time: Mon Nov 7 16:05:09 CST 2016
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/16-280  L2/16-282 Breastfeeding Emoji

Unfortunately, I did not get it finished before this year’s deadline, but am 
still working on a proposal for additional body part emojis, which shall 
include (generic) Breast. Such an emoji (in combination with 👶, 🍼, 👄, 🚼) 
may or may not satisfy the needs documented in these two closely related proposals.

Date/Time: Mon Nov 7 17:04:11 CST 2016
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/16-273  L2/16-275 L2/16-316 Food Emoji

An emoji use case that I’ve already encountered and expect to see more of, are
footnote markers. Food menus often bear footnotes for ingredients or
components that some people will like to avoid for medical, religious, ethical
or other dietary reasons. Some indications are required by local law, others
are voluntary. An emoji is a more helpful mnemonic aid than a random letter,
number or abstarct symbol. Unicode already includes emojis for many
ingredients that often act as allergens, e.g. Peanuts, and for most kinds of
animals (as well as generic Meat 🍖) that are avoided by some consumers.

It should be considered a strong reason to encode if a proposed character will
close one of the remaining gaps. The coconut or almond emoji, for instance,
could stand in for all other nut fruits (although existing 🌰 Chestnut could
graphically do so as well). Cereal could double as a gluten marker. There’s
nothing for celery, (soy) beans, mustard, sesame or lupins yet and some
additives (e.g. preservatives or flavor enhancers) could only be symbolized by

Feedback on UTRs / UAXes

Date/Time: Sun Oct 23 10:27:57 CDT 2016
Name: Eric Muller
Report Type: Error Report (UCD)
Opt Subject: DerivedBidiClass.txt

The file DerivedBidiClass.txt lists the ranges of code points that defaults to
AL and R, and lists blocks that are covered by those ranges. The lists of
blocks can give the impression of being exhaustive, but are not. I would
suggest to either make the lists exhaustive, or to remove them all.

Error Reports

Date/Time: Wed Aug 10 08:41:15 CDT 2016
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect Indic positional category for Javanese consonant sign cakra

The Unicode 9.0 data file IndicPositionalCategory.txt [1] gives the character
JAVANESE CONSONANT SIGN CAKRA the positional category Right.

In reality, the cakra is rendered either below its base consonant (in
different shapes, many of which wrap around the left side of the base
consonant) or, less commonly, isolated to the left of its base consonant. In
some styles, a cakra glyph may initially start out to the right from the
bottom right corner of its base consonant, but then it always turns down and
to the left.

Using the positional category Right also causes problems with the OpenType
Universal Shaping Engine [2]. There’s another Javanese medial consonant with
the same positional category, JAVANESE CONSONANT SIGN PENGKAL, and the USE
allows only one medial consonant of each positional category per cluster. As
Javanese expert Aditya Bayu Perdana informed me, the “cakra+pengkal
combination is actually fairly common in Sanskrit and Kawi literature and are
well attested.” Examples he provided from a Bharatayuddha epic printed in 1903
are available on request. The current combination of Unicode data and USE
specification does not allow the cakra+pengkal combination.

The positional category for JAVANESE CONSONANT SIGN CAKRA should therefore be
changed to Bottom, both to better represent the actual positioning of cakra
and to enable the cakra+pengkal combination in the USE.

[1] http://www.unicode.org/Public/9.0.0/ucd/IndicPositionalCategory.txt
[2] http://www.microsoft.com/typography/OpenTypeDev/USE/intro.htm#clustervalidation

Date/Time: Wed Aug 10 09:04:00 CDT 2016
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Positional categories missing for several characters in Brahmic scripts

The Unicode 9.0 data file IndicPositionalCategory.txt [1] provides no
positional category for at least some characters for which the data file
IndicSyllabicCategory.txt [2] provides the  syllabic category


This can cause problems for implementations of the OpenType Universal Shaping
Engine specification [3], as that specification assumes that every character
of syllabic category Consonant_Medial has a well-defined positional category.

[1] http://www.unicode.org/Public/9.0.0/ucd/IndicPositionalCategory.txt
[2] http://www.unicode.org/Public/9.0.0/ucd/IndicSyllabicCategory.txt
[3] http://www.microsoft.com/typography/OpenTypeDev/USE/intro.htm

Date/Time: Sun Aug 14 21:31:23 CDT 2016
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: Code Charts Typo


Iʼm sorry not to have checked that there is a little typo in the comment to
the mathematical double-struck letter subhead of the Mathematical Alphanumeric
Symbols block. “And” should be ‘are’. Reading forth, I believe that the
“right” in the comment to the first character, U+1D538 MATHEMATICAL DOUBLE-
STRUCK CAPITAL A, should read ‘left’. However I wouldnʼt mention this weird
glyph variant at all.

While the typo is a matter to report by Contact Form, the double-struck A
could perhaps be discussed on the Mailing List. Though Iʼd rather not because
it isnʼt about encoding but font design.

I got aware while looking up these alphabets in the NamesList, as Iʼm actually
trying to implement them on the keyboard layout in order to complete the Latin
script support.


Date/Time: Thu Aug 25 19:18:06 CDT 2016
Name: Kamal Mansour
Report Type: Error Report
Opt Subject: Error in Myanmar section 16.3

On page 620 of Unicode 9, under "Subjoined Consonants", the following is stated:

The following characters may take a subjoined form, which takes the same
shape  as  the  base  character  but  smaller:  U+1000,  U+AA61,  U+1010, ...

In a similar list, UTN 11-4, page 49, shows "U+AA60" instead of "U+AA61".

Date/Time: Wed Aug 24 09:29:11 CDT 2016
Name: Mike FABIAN (Redhat)
Report Type: Error Report
Opt Subject: These charaters are used in SC and TC: U+4E7E 乾, U+6770 杰, U+7CFB 系, U+8868 表, U+8986 覆, U+9762 面

The characters

U+4E7E 乾, U+6770 杰, U+7CFB 系, U+8868 表, U+8986 覆, U+9762 面 

are listed in the Unicode data as being one variant of Chinese only:


But they are apparently all used both in simplified and traditional Chinese.

Looking these characters up at http://www.zdic.net/, I find them
all listed as both simplified and traditional Chinese.

Therefore, I think that the Unihan_Variants.txt file
should be changed like this:

--- Unihan_Variants.txt
+++ Unihan_Variants.txt
@@ -484,7 +484,7 @@
 U+4E70	kTraditionalVariant	U+8CB7
 U+4E71	kSemanticVariant	U+4E82<kMatthews,kMeyerWempe
 U+4E71	kTraditionalVariant	U+4E82
-U+4E7E	kSimplifiedVariant	U+5E72
+U+4E7E	kSimplifiedVariant	U+4E7E U+5E72
 U+4E7E	kSpecializedSemanticVariant	U+4E81<kFenn
 U+4E80	kZVariant	U+9F9C
 U+4E81	kSpecializedSemanticVariant	U+4E7E<kFenn
@@ -3798,7 +3798,7 @@
 U+676F	kSemanticVariant	U+76C3<kLau,kMatthews,kMeyerWempe
 U+676F	kZVariant	U+76C3
 U+6770	kSemanticVariant	U+5091<kMatthews
-U+6770	kTraditionalVariant	U+5091
+U+6770	kTraditionalVariant	U+6770 U+5091
 U+6771	kSemanticVariant	U+4E1C<kFenn
 U+6771	kSimplifiedVariant	U+4E1C
 U+6774	kSimplifiedVariant	U+9528
@@ -6086,7 +6086,7 @@
 U+7CF9	kSimplifiedVariant	U+7E9F
 U+7CFA	kSemanticVariant	U+7CFE<kMatthews,kMeyerWempe
 U+7CFA	kZVariant	U+7CFE
-U+7CFB	kTraditionalVariant	U+4FC2 U+7E6B
+U+7CFB	kTraditionalVariant	U+7CFB U+4FC2 U+7E6B
 U+7CFB	kZVariant	U+7E6B
 U+7CFE	kSemanticVariant	U+7CFA<kMatthews,kMeyerWempe
 U+7CFE	kSimplifiedVariant	U+7EA0
@@ -7596,7 +7596,7 @@
 U+8864	kSemanticVariant	U+8863<kMatthews
 U+8864	kSpecializedSemanticVariant	U+8863<kFenn
 U+8865	kTraditionalVariant	U+88DC
-U+8868	kTraditionalVariant	U+9336
+U+8868	kTraditionalVariant	U+8868 U+9336
 U+886C	kTraditionalVariant	U+896F
 U+886E	kSemanticVariant	U+889E<kMatthews
 U+886E	kTraditionalVariant	U+889E
@@ -7713,7 +7713,7 @@
 U+897E	kZVariant	U+897F
 U+897F	kZVariant	U+8980
 U+8980	kZVariant	U+897F
-U+8986	kSimplifiedVariant	U+590D
+U+8986	kSimplifiedVariant	U+590D U+8986
 U+8986	kZVariant	U+5FA9
 U+8987	kSemanticVariant	U+9738<kMeyerWempe
 U+8987	kZVariant	U+9738
@@ -10018,7 +10018,7 @@
 U+975C	kSimplifiedVariant	U+9759
 U+975D	kSemanticVariant	U+5929<kMatthews
 U+975D	kZVariant	U+9754
-U+9762	kTraditionalVariant	U+9EB5
+U+9762	kTraditionalVariant	U+9762 U+9EB5
 U+9765	kTraditionalVariant	U+9768
 U+9766	kSemanticVariant	U+89A5<kMeyerWempe
 U+9766	kSimplifiedVariant	U+817C

Date/Time: Sat Oct 1 01:55:39 CDT 2016
Name: Junichi Chiba
Report Type: Error Report
Opt Subject: Dates in Japanese Era Names in Unicode Standard

I'm looking at the latest Unicode Standard [1] listing the dates for Japanese
Era Names in Table 22-8. They seem to have one day difference with the dates
that are recognized publicly in Japan. I assume that the cause was a simple
chain of mistakes while drafting the unicode document. I already posted the
details with further reference to the mailing list, and received an agreement
[2] by one of the participants.

[1] http://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf
[2] http://unicode.org/pipermail/unicode/2016-September/004017.html

Current values:
U+337B square era name heisei 1989-01-07 to present day
U+337C square era name syouwa 1926-12-24 to 1989-01-06
U+337D square era name taisyou 1912-07-29 to 1926-12-23
U+337E square era name meizi 1867 to 1912-07-28

Suggested correction:
U+337B square era name heisei 1989-01-08 to present day
U+337C square era name syouwa 1926-12-25 to 1989-01-07
U+337D square era name taisyou 1912-07-30 to 1926-12-24
U+337E square era name meizi 1868 to 1912-07-29

Thank you.

Other Reports

Date/Time: Thu Oct 13 00:53:51 CDT 2016
Name: Weizhe Zheng
Report Type: Error Report
Opt Subject: Mongolian glyph problems

(Note: This report on Mongolian is superseded by other documents in the register.)

This is to report a number of glyph problems in the Unicode 9.0 code chart of
the Mongolian block.

1. 186F Mongolian letter Sibe ZA, second forms

In the Unicode 9.0 code chart, a short descender is added to the second
initial form and the second medial form of 186F. This is not desired, because
the second forms of 186F are used only in the combinations 186F+185E and
186F+1873, and the descender prevents the glyphs from being correctly joined.
The lower part of the second form of 186F should join smoothly with the second
final form of 185E/1873 or the third medial form of 185E/fourth medial form of
1873. The descender is not present in any of the previously published
documents on Mongolian variants: Unicode 8.0 or earlier, Report 170, Mengguwen
Bianma, GB/T 26226-2010, and should be removed.

2. 185E Mongolian letter Sibe I, third medial form

In the Unicode code chart, the glyph for the third medial form of 185E is
identical to the first medial form of 185E. This seems to be a mistake. The
third medial form of 185E should be identical to the fourth medial form of
1873, as correctly shown in Report 170, Mengguwen Bianma, and GB/T 26226-2010.
The third medial form of 185E is only used in the combination 186F+185E, and
the glyph should join smoothly with the second forms of 186F.

3. issues with the new format

In the Unicode 9.0 code chart, positional (first) forms are added for
characters with standardized variants. There are however two issues with this

- Three isolate (first) forms are missing from the charts: the isolate (first)
forms of 185D, 185E, and 1873. These should be added as well.

- Several positional forms that do no exist in previously published documents
are added, such as the final forms of 1869, 1876, 188A, and the initial and
medial forms of 1887. These forms are not attested, and should be removed.
There are at least three attested final forms for 188A, but all different from
the glyph shown in the Unicode 9.0 code chart.

4. 1887 Mongolian letter Ali Gali A

The code point glyph of 1887 does not seem to be attested. The isolate form of
Mongolian Ali Gali A should be identical to the second isolate form of 1820.
In Unicode and some other documents on Mongolian variants, there seems to be
some confusion between Mongolian Ali Gali A and Manchu Ali Gali AH (the latter
transcribing འ 0F60 Tibetan letter -A). The first final form of 1887 on the
code chart is in fact the final form of Manchu Ali Gali AH, while the first
final form of Mongolian Ali Gali A should be identical to the second final
form of 1820. The fourth final form of 1887 in Unicode 8.0 and earlier is
simply the combination 1820+Manchu Ali Gali AH. The fourth final form of 1887
in Unicode 9.0 is a variant of this combination.

Should further information be needed, please do not hesitate to contact me.