Accumulated Feedback on PRI #408

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Sun Oct 27 11:17:23 CDT 2019
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #408

In “QID emoji tag sequences for flags or other symbols that represent an entity 
should use the QID for the flag or symbol itself if available, not the flag for 
the entity,” it should say “the QID of the entity” not “the flag of the entity”.

Date/Time: Tue Nov 5 10:23:02 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #408: QID Emoji Sequences

I’ve already said this on the previous PRI, but it bears repeating: QID
sequences are fundamentally unworkable because they destroy the concept of
character identity. I firmly believe the UTC is considerably underestimating
the implications of providing a mechanism that can encode exactly the same
information in several, mutually incompatible ways. Unicode was created to
get rid of shenanigans like this once and for all, but the proposed QID
mechanism is almost inviting this practice.

Once the QID mechanism is approved, every single object or concept with an
associated QID will already have a canonical Unicode representation by
default. That’s the whole point of formally defining such a mechanism.
Unicode *wants* vendors and private persons to use these tag sequences to
represent specific emoji; the standard is explicitly endorsing it. Otherwise
people could just continue using private‐use characters as usual.

New emoji characters could never be added to Unicode again because they
would duplicate the already existing encoding, thus invalidating all prior
usage. This would be akin to the UTC formally endorsing a certain PUA
assignment for a script and encouraging people to develop fonts and input
methods for it, but then just encoding it properly anyway two years later.
Unicode should not be in the business of creating new legacy data problems.
That is why comprehensive stability policies exist.

Having to tell people to stop using a perfectly valid sequence because there
is a separate character for the same purpose should be avoided at all costs.
There is some precedent for this in the Unicode standard due to historical
accidences, for example the fact that U+0322 COMBINING PALATALIZED HOOK
BELOW should not be used to compose new letters with palatalized hook.
However, these cases are far and few between and only apply under very
limited circumstances. QID sequences meanwhile could – by their very nature
– represent almost anything in the world.

Simply stating that uniqueness of representation cannot be ensured outside
of RGI is not enough, because even though QID sequences aren’t RGI, they are
still *official*. This is different from ZWJ sequences which have no
intrinsic meaning until someone decides that they do. There is no reason why
“🏴+☠️” should be the one true representation of a pirate flag compared to
any other possibility; it’s completely arbitrary. But QID sequences by
definition always have one and only one specific meaning even if no font is
ever going to support them.

If QID emoji must exist, they can only ever work if the standard very
clearly states that they should only be used for things that have no chance
of being encoded otherwise. This includes entities that are explicitly
forbidden (deities, landmarks, celebrities, brands etc.), things that are
impossible to encode (e.g. flags of regions without ISO codes), as well as
specific variants of more general concepts (exact dog breeds, different
types of sandwich, and so on). So Q4545971 (gelatin dessert) would not be a
recommended QID because there is no reason why a gelatin dessert emoji
couldn’t be encoded as an atomic character if someone submits a proposal,
but Q39058 (Shetland Sheepdog) would be recommended because there already
exist plenty of emoji to represent dogs and more aren’t needed in the core
set.

┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈

As to whether QID sequences should be part of UTS #51 at all: The track
record for generalised emoji mechanisms hasn’t been great so far.

• Tag sequences for regional flags have been possible since 2017. In that 
time, the UTC has not RGI’d any new flags beyond the initial three, and 
only one vendor has ever decided to support a non‐RGI sequence: WhatsApp 
with 🏴󠁵󠁳󠁴󠁸󠁿 (Flag for Texas).

• Hair components were added in 2018. No vendor has ever supported any 
sequences incorporating these characters that weren’t RGI.

• The ability to modify emoji direction was added in 2018. Nobody has 
ever used this for anything.

• Colour modifiers were added earlier this year. So far there have only 
been three proposals making use of these: One that was accepted (🐈️‍⬛️), 
one that was rejected (🍷‍⬜️), and one that was later modified to no longer 
use the mechanism (🐻‍❄️). Of note is that none of these proposals used 
any of the new colour characters, but only those that already existed before.

• ZWJ sequences in general are very underdeveloped outside of RGI. Ignoring 
gender variants which always follow a set pattern, vendors haven’t used the 
ZWJ to invent new emoji for years; the most recent example is probably 🏴‍☠️.

The UTC spends a lot of time and effort developing emoji mechanisms that
could in theory be used to significantly expand the emoji repertoire beyond
the RGI list, but ultimately always end up becoming just another tool to
encode a handful of RGI sequences, never to be spoken of again afterwards.
They are inaccessible to the common people and unattractive to major
vendors. I see no reason why the eventual fate of QID emoji should be any
different. Emoji are used exclusively for their visual appearance. Nobody
cares what emoji mean in the abstract, only what they look like, and
unsupported emoji sequences never look good. Considering the particularly
terrible fallback display of QID sequences, I would in fact be very
surprised if anyone at all ever ends up supporting even a single one of
them.

They have no value outside of small, isolated systems because they become
utterly incomprehensible to anyone who doesn’t have the right font
installed, but such closed networks usually already have much better tools
in place to include pictographic images in running text. Custom emotes on
Twitch or Discord cannot be exchanged outside of these sites and still be
expected to show the correct glyph, but they don’t turn into featureless 🆔⁠s
like these tag sequences would; they turn into (more or less) descriptive
names.

To discover the intended meaning of a QID sequence in the wild, you would need to

① know that the 🆔 you encountered is actually meant to be another emoji 
	(which is unlikely because tag characters are invisible),
② copy the sequence (which is impossible or very cumbersome in many mobile apps),
③ paste it into a tool for analysing Unicode characters (which most people do not know about), and
④ look up the resulting QID on Wikidata (which hardly anyone is aware even exists).

In terms of interchangeability, QID emoji rank just barely above private‐use
characters. I would argue that they are even worse in some areas, because
Mozilla Firefox at least displays each tofu with a little codepoint label,
whereas all unknown QID sequences would look exactly like 🆔. The
recommendation to signify invalid or unrecognised tag sequences with a
special “error” glyph has only been implemented by a single vendor, and only
partially.

Even if we assume that knowledge of QID emoji would just spread via word of
mouth to such an extent that most people would know about their existence,
that still doesn’t mean that they could actually be used. Installing fonts
is not possible on most mobile phones without jailbreaking, and even a font
that is installed on a system isn’t guaranteed to be chosen for text display
in all contexts. 🆔 is most likely going to be shown in the system’s default
emoji font first and foremost, and then all the other fonts in the stack
that might include specific QID sequences don’t matter anymore.

It’s even worse with services like Twitter that replace emoji characters
with embedded images because then there is absolutely zero chance of
arbitrary QID sequences displaying as intended. In practice, QID emoji are
always going to be confined to a potentially small number of messaging apps
that actively took the effort to develop special logic for dealing with
them. This is in stark contrast to not just the rest of emoji, but Unicode
in general. These QID emoji are effectively just a less portable, less
versatile version of stickers.

Furthermore, creating colour fonts is not something the average person can
easily do. The tools necessary to do so are not freely available most of the
time. Nevermind the fact that there exist four different formats for emoji
fonts, none of which is compatible with any other. The New Zealand Kennel
Club can’t just create a font with a glyph for Q39058 and distribute it
among dog fans; they need to create four different fonts, potentially with
completely different glyph designs unless they settle on an image that can
be represented by all four formats.

Of course, people could always create monochrome fonts instead because they
work everywhere, but ⓐ black‐and‐white glyphs for emoji are not very popular
among the general public, and ⓑ many of the things people would want to use
QID emoji for (flags, food variants, animal breeds etc.) would be almost
unrecognisable without proper colour.

Date/Time: Thu Nov 7 09:55:28 CST 2019
Name: William Overington
Report Type: Public Review Issue
Opt Subject: Public Review 408: QID Emoji

I have been thinking about this review at times over a few days. I was about
to reply a short while ago and I had a look to see if anyone else had
replied since I look before.
 
I noticed, and have read with interest, the reply from Ms Charlotte Buff.
 
Until reading Ms Buff's comments I was going to answer question (a) as asked
simply with "Yes, please add it." though I do note from the minutes of the
recent UTC meeting that it is not now a matter of adding it, but having a
separate document.
 
However, now, realizing that Ms Buff makes some important comments I need to
think further about it all. In principle I consider that the idea of having
QID emoji is good and should be implemented. Yet maybe there needs to be
thought of how this could be done whilst taking into account Ms Buff's
comments. For example, encouraging fonts to have displayable glyphs for tag
characters, even if just for the tag version of the letter Q, the tag
versions of the ten digits and the cancel tag. I made a font that had those
when I was doing some tests and it worked extremely well in the Affinity
Publisher program: I chose characters to build up the QID sequence and the
glyphs were displayed; when the cancel tag was entered the OpenType GSUB
table in the font indicated a glyph for the complete sequence and the
desired glyph for the QID emoji was displayed. Maybe Unicode Inc. could
offer as a free font a font with just those twelve visible glyphs and
specific giving of free permission without any "strings, or using a whatever
so-called licence" to copy and paste the glyphs and their related
information into any font that someone is making, offered as a free service
for the public good. Maybe several such fonts so as to suit various font
formats and various options within those formats (such as when some fonts
use font units up to 1000 and some use them up to 2048, that sort of thing).
That would help with analysis of what is going on in some cases.
 
As for question (b) about changes in the specification, well I am perhaps
going a bit off-topic but nevertheless, two matters that I think are well
worth considering in relation to matters relating to the specification.
 
(1) I know that there are views that QID emoji are not characters. I know 
that they are not atomic characters, but I am concerned, as I am with other
sequences that are purportedly not characters, that Unicode is going to get
increasingly out of synchronization at a practical applicability level with
ISO/IEC 10646, notwithstanding any theoretical basis that it is not out of
synchronization at a formal level with ISO/IEC 10646. To me it seems that it
would be desirable to try to get some sort of agreement with the committee
that manages ISO/IEC 10646 as to how both systems relate to QID emoji. I
opine that an end user, at some future time, if QID emoji are implemented
and possibly widely used, who sees displayed a tag sequence of a QID emoji
for which he or she does not currently have a glyph of the intended QID
emoji, the important practical consideration will be being able to
understand what is going on. If ISO/IEC 10646 has not even a note about what
it is about in general terms, then that would not be a helpful situation for
that end user. I ask that Unicode Inc. raise the matter with the ISO/IEC
10646 committee please and ask their advice.
 
(2) What if instead of a tag Q, what if another tag character were used. 
This would then indicate something else other than a QID emoji. For example,
suppose that the tag character were an exclamation mark and the whole tag
sequence code were for a localizable sentence. That would open up a lot of
possibilities for communication through the language barrier. 

For example, the codes in the following linked document.

http://www.users.globalnet.co.uk/~ngo/A_List_of_Code_Numbers_and_English_
Localizations_for_use_in_Research_on_Communication_through_the_Language_
Barrier_using_encoded_Localizable_Sentences.pdf

Now I fully realize that that would most probably not be agreed to by UTC,
simply because, at present, they are at a research project level and also
because they are by an individual, though even if they were by a company,
large or small, the answer might well be the same.
 
Yet what if those codes, or some other codes, were an ISO standard. For the
avoidance of doubt I am not in that context meaning the ISO/IEC 10646
standard. Would UTC agree to it then? Or would a different base character be
better for such a system, different from QID emoji. Could you consider that
please?
 
William Overington 
 
Thursday 7 November 2019

Date/Time: Fri Nov 8 10:28:16 CST 2019
Name: William Overington
Report Type: Public Review Issue
Opt Subject: Public Review 408: QID Emoji

I wonder if the following might help address some of the issues raised by Ms
Charlotte Buff. This idea is put forward as a starting point: UTC and
contributors to this Public Review are welcome to alter the idea around and
improve it as desired.

At the moment there is RGI (Recommended for General Interchange).

What if there are instead five categories, say, as follows.

Recommended for General Interchange

Popular

Worthwhile

Interesting

Noted

The Recommended for General Interchange would be as now. If someone
implements a QID emoji and uses it just a little then he or she may, if he
or she chooses to do so, email Unicode Inc. and inform Unicode Inc. of that
use, perhaps with some basic information such as an image and a note as to
whether a new QID had been generated for the purpose of producing an emoji
or whether a previously existing QID entry had been used, with an option of
also including a note as to the motivation for implementing that particular
emoji. Unicode Inc. would just check that generally and, all being well,
would add it to the Noted list. That way there would be a list of what is
about in use, even if just a little. Unicode Inc. could increase the
category depending upon evidence of use and popularity. That way, there
would be publicly accessible lists and maybe that would help.

William Overington

Friday 8 November 2019

Date/Time: Wed Nov 13 14:29:11 CST 2019
Name: William Overington
Report Type: Public Review Issue
Opt Subject: Public Review 408: QID Emoji

Thinking about the proposal, it has occurred to me that if as well as tag Q
for QID emoji UTC were to have in addition the facility to use tag q instead
of tag Q, then tag Q could indicate a QID emoji of fixed width format and
tag q could indicate a glyph based on the same QID item but not of fixed
width. That way unencoded scripts could be got up and running by having a
QID item for each character of the script. This would mean that the encoding
would not depend upon use of the Private Use Area. An OpenType font could
use glyph substitution to produce a display. Not as good as an encoding in
regular Unicode, yet capable of being introduced promptly.
 
By introducing this facility, the QID emoji proposal could have uses far
beyond just emoji.
 
William Overington
 
Wednesday 13 November 2019

Date/Time: Mon Nov 18 18:49:58 CST 2019
Name: James Kass
Report Type: Public Review Issue
Opt Subject: PRI #408: QID Emoji Sequences

QID Emoji represents an interesting approach to plain-text.  The approach is
reminiscent of suggestions made in the past to the Unicode public list which
were dismissed at the time.  For example, the QID material database could be
just as simply referenced in plain-text by the following:

COMET + CIRCUMFLEX + Q + <the ID number in ASCII> + CIRCUMFLEX + COMET

As the creator of the comet circumflex method notes here:
http://www.users.globalnet.co.uk/~ngo/c_c00000.htm
... the comet-circumflex string is unlikely to occur elsewhere in plain-text.

One advantage of the comet circumflex combination as plain-text mark-up over
lengthy strings of TAG characters is that fewer bytes would be needed for
each QID emoji.  Which means that emoji users would be able to fit more
emoji into a single tweet on Twitter.

Another advantage is that BMP-only legacy software would already have a
perfectly legible built-in fallback display.  Legacy software could even be
used for input in a pinch.

(When the Comet Circumflex System was envisioned in 2002, tweet length
constraints weren't much of an issue.  The author uses COMET + COMBINING
CIRCUMFLEX + other combining characters to mark start/end of the string,
along with encircled digits instead of ASCII digits.)

Date/Time: Fri Nov 22 15:19:23 CST 2019
Name: Nicholas Felker
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on proposal #408 QID Emoji

I want to provide some feedback on the proposal, as I think there are pros
and cons to the approach. On one hand, I do like the effort to scale emoji
to enable a broader set of pictorial characters. It would certainly enable
novel and unpopular emoji to be used and shared.

I think I share some of the concerns on others who have feedback. With this
proposal, suddenly there are the potential to have thousands or millions of
emoji based on these identifiers. This would create a significant burden for
font developers, especially as at launch none of these will be supported. It
would need to be incorporated into system-level keyboards and fonts. The OS
vendors may be unlikely to support many of these in their font, which in
turn would result in a lack of keyboard support and a lack in usage.

One of the cited examples, of a small dog kennel creating their own font,
seems potentially feasible but unlikely. In terms of existing 'digital swag'
a brand may produce, a font isn't one of them. Let's say it becomes a common
thing. How would people install these fonts? On desktops it's not
necessarily easy, and on mobile devices it's nearly impossible. It's even
worse as one looks to a variety of simpler embedded devices like
smartwatches and fitness bands which lack the openness to do anything. A
text message containing a specific breed of dog would make no sense on my
watch.

I think a similar concern could be around the creation of 'bad faith' fonts,
which may misuse the system to provide inappropriate or misleading emoji
(using a Pepsi emoji instead for the Coke QID). Right now I just use the
font provided by the OS, but in this proposed system one would need to
download them, perhaps without a way to verify quality or keep them up to
date.

Some of the suggested workarounds are also not ideal. Screenreaders may read
out a description of the character. For screenreaders, to get an accurate
description they would need to query the Wikidata API to get a label, at
least the first time. This prevents them from working offline entirely or in
places with poor connectivity.

A tag_base may serve as the start of any QID character to act as the default
character, which could result in instances where the meaning changes
dramatically (bird -> NATO flag as noted). Relying on the OS or the font
to provide appropriate substitutes also seems like an issue, as it requires
an Internet connection to do a lookup, as maintaining local copies of
everything and keeping them up to date seems highly infeasible.

> If an emoji QID sequence becomes popular, Unicode may define a
different RGI representation using a character or sequence to save memory.

I think this sounds good, although it is later noted "We don't anticipate
having a normalization process for QID emoji.". Aside from saving memory, it
suddenly becomes complicated for every vendor to do this mapping, as a QID
for a dog should map to the dog emoji.

In general, my feedback is that I like the idea, but I have concerns about
the implementation and how it may actually work in practice. There's a lot
of potential problems for things not to work, creating confusion and
incomplete text. I understand that Unicode may not be able to have much
control over the implementation by vendors, but providing reference material
would be good for verifying the feasibility of this system at scale in real
usage.

Date/Time: Mon Nov 25 10:32:13 CST 2019
Name: David Lewis
Report Type: Public Review Issue
Opt Subject: PRI #408: QID Emoji Sequences

I have to agree with others who have posted on this subject.  With QID it
appears that the Unicode Consortium is for some reason attempting to defeat
the entire purpose of the Unicode Consortium.

The entire point of Unicode is for one body to decide for all of computing
what character a particular sequence of binary digits represents across all
implementations around the entire world.  It does slow the process of adding
new symbols considerably, but in exchange a host of issues are bypassed.  If
everyone implements Unicode according to the standard, there will never be
any more conversion errors again.  The character you expect to display is
the character you WILL display, if your font supports it.

Private use is one thing.  That is a part of the standard that is
intentionally left non-standard.  Those who choose to use private use
section understand that they're entering uncharted waters.  It has limited
use, but those uses are fairly well limited to those things the designers
INTEND to be limited.  Use of Klingon script on a fan-website is not going
to cause problems of a serious nature with other sites who might
misinterpret the Klingon script they receive; they're not going to try to
receive any, and if they did they wouldn't likely try to interpret it unless
they knew what it was and had a Klingon font.

QID seems to be taking it to another level, inviting a host of developers to
create their own suites of characters in a disorganized, haphazard fashion
that's bound to cause the same kinds of overlaps, gaps, and conversion
mistakes that required the Unicode consortium to have to be created in the
first place.  People are probably going to be attempting to communicate
using one QID vendor with another person using a different vendor, and all
the QIDs will probably wind up not what the person intended to communicate
to the other at all.  

I don't trust a Wiki as a governing body.  I don't think we should wait a
couple of years for QID to get so messed up that a body of individuals have
to create a QID Consortium to bring the world to one singular global QID
standard.  We already have a body that brings the world to one singular
global standard for Emojis.  The price of standardizing QID to the point
that it's rendered usable is higher still than just standardizing Emojis
like you already do.  If anything, you could simply alter the process to
enable a larger number of Emojis to be added every Unicode release.  That's
much easier, in my opinion, and solves the problem far more gracefully.

Date/Time: Tue Nov 26 07:50:00 CST 2019
Name: David Lewis
Report Type: Public Review Issue
Opt Subject: PRI #408: QID Emoji Sequences

It seems to me that rather than supporting an entirely new mechanism for
Unicode to support unsupported emojis, it would be far easier, more
sustainable, more effective, and less burdensome on the Unicode Consortium,
the public, and vendors for Unicode to just have fewer unsupported emojis.

An example would be particular breeds of dog.  Why do we need to completely
change everything about how emojis work just to get that?  We already have
something like 18 characters for superhero; 6 skin tones each for male,
female, and gender neutral.  Why not let all vendors choose whatever breed
of medium size dog they like for a default, then use ZWJ and color mechanism
(if the vendor even wants to support it) for a black breed of dog, a brown
one, a white one, a golden one (golden retriever obvious choice), maybe if a
vendor chooses dog + orange can represent a dingo.  Another ZWJ modifier for
big or small could be included.  Dog + small + white could be a Maltese; dog
+ big + white could be a sheep dog.  Dog + black + white could be a
dalmatian.  If your font doesn't support it, oh well.  If your font doesn't
support it yet, oh well.  Only the most popular breeds need be included
first.  If the first initial round of extremely popular breed ZWJ sequences
get usage, you can consider including more.

If we're thinking about creating an entirely new mechanism that may go
completely unused, why are we so adamant about avoiding a single character
or even a single ZWJ sequence that may go unused?  Perhaps the Unicode
Consortium could be a little more lenient in terms of potential usage
estimates, and a little more lenient in terms of items that are already
representable.  

Yes, trash can + fire does convey the same general idea as "dumpster fire",
but not to the level of scale intended.  "Garbage fire" is a term for a
problem that hasn't reached full "dumpster fire" status.  Already
representable yes, but not ideally yet in many cases of rejected emojis I've
seen in the past.  Yes, baby face followed by drop might convey crying baby,
but is it really that hard to make a glyph of a baby that IS crying to
convey the idea more precisely and succinctly?  If not worthy of it's own
code point, could it have a ZWJ sequence?

Perhaps, as a new standard operating procedure, if a proposal for a glyph
does not meet requirements to be approved for it's own code point, it should
enter as a contender for being approved as a new ZWJ sequence, even if the
author of the proposal didn't initially think of it.  It does create more
work for the Consortium, but as much additional work as keeping QID from
blowing up?

Date/Time: Mon Mar 2 11:35:30 CST 2020
Name: William Overington
Report Type: Feedback on an Encoding Proposal
Opt Subject: Public Review 408: QID Emoji


Having looked at this issue from time to time I write to suggest a
compromise possibility that I hope that the Unicode Technical Committee will
consider please.
 
The general idea of QID emoji is, in my opinion, good.
 
However there are disadvantages too.
 
How about not using Q but using another capital letter and use a new wiki
hosted by Unicode Inc. specifically for the purpose? Then you could have
many of the benefits of QID emoji, yet also have very light moderation by an
Officer of Unicode Inc. as well. This would also get rid of the issue of
whether Unicode Inc. allows that particular QID to be an emoji or not. With
your own wiki and your own code space and the light moderation, people can
know for certain that if it has been there for more than a few days then it
is a permitted emoji. You could also lock a page so that once there it could
only be changed, and then only for a minor error or something like that,
with the agreement of the moderator: this would provide long term
stability.
 
Also, someone could ask the moderator for allocation of a block of code
numbers if the proposed emoji would be part of a coherent set.
 
So all of the benefits of a QID emoji system but eliminating the negatives.
 
This system could have much of the freedom that the Private Use Areas
provide, yet also have unique encoding for each encoded item and also
interoperability from computer to computer across various platforms.
 
William Overington
 
Monday 2 March 2020

Date/Time: Wed Apr 15 05:09:57 CDT 2020
Name: Jonathan Kew
Report Type: Public Review Issue
Opt Subject: Mozilla Feedback on PRI #408 “QID Emoji”


Mozilla urges the Unicode Consortium not to adopt the QID Emoji proposal.

The proposal provides for a mechanism for minting emoji that bypasses the
normal Unicode Consortium processes. We believe this would lead to
problematic effects. While the foreseeable problems could be argued to have
precedent in the sense that similar problems already exist in Unicode, the
precedent should be viewed as problems that shouldn't be made worse and
should not be viewed as a license to let the problems proliferate.

(This is a summary paragraph only; full writeup submitted to UTC via email.)

Date/Time: Mon Apr 20 13:06:34 CDT 2020
Name: William Overington
Report Type: Public Review Issue
Opt Subject: Public Review 408: QID Emoji


The document to which Mr Kew refers is linked below.
 			
https://www.unicode.org/L2/L2020/20110-qid-emoji.pdf
 
In that document Mr Sivonen writes as follows.
 
> Mozilla urges the Unicode Consortium not to adopt the QID 
> Emoji proposal.

Mr Sivonen later writes as follows.
 
> There doesn’t appear to be a good reason to believe that if 
> QID emoji was implemented, the mechanism would stay scoped to
> emoji and wouldn’t be used for encoding genuinely textual characters 
> in a way that would circumvent Unicode processes.
 
If the choices available for the Unicode Technical Committee were to say
either 'yes' or 'no' to the proposal, then I suggest that 'no' would be tne
better decision. Yet the Unicode Technical Committee is not restricted to
those two choices.
 
For example, the discussion could be scheduled for the first day and the
proposer could be informed that the proposal is not accepted in its proposed
form, yet, if a revided proposal is submitted for consideration later in the
week, after some discussions in ad hoc meetings, then the revised proposal
will be considered as if it is a fresh proposal and a decision reached.
 
So, if the idea of the QID wiki being used is dropped and a database under
the control of Unicode Inc. is used instead, where there is usually very
light moderation, but firm moderation is always possible, and maybe a few
other changes are made if the Unicode Technical Committee opines that they
are necessary or desirable, then the main intent of the proposal can become
implemented, yet in a way that avoids the good name of Unicode Inc. being
potentially dragged through the gutter by something that is put in a wiki
over which Unicode Inc. has no control.
 
I opine that the use of a base character and a sequence of tag characters is
sound, and could be extended to other items beyond emoji, and could be a
catalyst for a renaissance of creativity using information technology in an
interoperable manner, yet not filling up the Unicode character map. The big
problem with the original proposal is linking it to an external wiki. The
Unicode Technical Committee has the opportunity to allow progress to
flourish into the future by using the good parts of the original proposal in
a rigorously controlled manner.
 
William Overington
 
Monday 20 April 2020

Date/Time: Mon Apr 20 14:31:58 CDT 2020
Name: Denny Vrandecic
Report Type: Public Review Issue
Opt Subject: PRI 408 QID Emoji - Feedback

This is formal feedback to PRI issue #408 regarding the proposal of QID
Emoji.

Wikidata is a Wikimedia project and follows the principles of open knowledge
creation and curation that have led Wikipedia to be the project it is today.
Wikidata’s goal is to allow everyone to share in an open knowledge graph
that anyone can edit and use.

Wikidata has more than 25,000 monthly contributors, and has seen more than
1.1 billion edits, creating more than 80 million Items. Each of these Items
is identified by what we call a QID (short, for Q-Identifier, as the
identifiers are starting with the letter Q and followed by a number). These
QIDs are meant to be quite stable: a QID can get discontinued when an Item
is deleted, but the QID then never gets reused, thus not leading to
ambiguity. A QID can also be forwarded to another QID when two Items are
merged, but in this case the QID and their relation is recorded. Deletions
happen rarely, and by definition only for Items that are not notable. The
QIDs for almost all Items of wider interest have remained stable since their
creation. Wikidata provides a service to resolve QIDs and get back human-
and machine-readable names and descriptions of the Items of interest.

Wikidata has become a major authority hub for identity. Not because of
complex processes and selective contribution requirements, but on the
contrary, because of the ease of contributing and its adherence to
Wikipedia’s principles of openness and inclusion. Wikidata links together
several thousand databases and authority files, allowing to swiftly join
data indexed with ICD identifiers and Dewey Decimal Classification codes.
This has led to Wikidata being described as a crystallization point of
identifiers, as an authority file of authority files, or as a modern Stone
of Rosetta. Even more importantly, although Wikidata only launched a few
years ago, it is already being used by a growing number of institutions as
an important authority file.

These institutions include, but are not limited to:
The US Library of Congress
The German National Library
Virtual International Authority File VIAF
The New York Times
Google
Museum of Modern Art
iNaturalist
Carnegie Hall
MusicBrainz
Open Street Maps
Schema.org
Quora
OCLC WorldCat
And many more.

Given that these and other authorities are already relying on and trusting
Wikidata and its open processes to curating a comprehensive and current
catalogue of identifiers, we are humbled and pleased to learn about the
proposal to the Unicode Consortium to consider using Wikidata QIDs as an
additional approach to identify the meaning of an emoji. We understand that
this would allow stakeholders to expediently introduce new emojis, be able
to measure their real-world adoption, and provide unambiguous and stable
emoji tag sequences. We think that this is a great application of Wikidata
as an identifier catalogue, and we fully support this proposal.

Lydia Pintscher, Wikimedia Deutschland, Product Manager Wikidata
Denny Vrandečić, Founder Wikidata
Joint statement

P.S.: if of interest, the Wikidata community already records a few thousand
Unicode characters as being identified with a given QID. We could think that
this kind of mapping can be useful to stakeholders for example to do some
form of normalization or fallback. As of the time of writing, there are
9,913 such mappings using the Property P487 (see https://w.wiki/NRB for a
current list).

Date/Time: Thu Jun 4 15:04:55 CDT 2020
Name: William Overington
Report Type: Public Review Issue
Opt Subject: Public Review 408: QID Emoji

I opine that when considering a new idea it is important to be prepared to
suspend disbelief and consider if any parts of the idea are good, rather
than just the total idea.
 
I opine that the QID Emoji proposal has some very good aspects but is
somewhat unstable as a whole.
 
So, if those in favour of the proposal and those against are each willing to
be like the strongest trees and sway in the breeze then the good parts of
the proposal could become available in a stable manner.
 
For example, maybe registration in a Unicode Inc. database, with the option
of a cross-reference link to QID, would mean that only those QID where
someone wants an emoji for that QID would be in the Unicode Inc. database,
and a gentle moderation policy could be used to stop ambiguity and
duplication. So maybe shorter codes.
 
What if U+FFF0 is defined, mutatis mutandis, as effectively what would be a
ligature of the ID emoji and tag Q in the original proposal, U+FFF8 is
defined as the corresponding CANCEL and circled digits are used. All part of
the basic plane, so fewer bytes for each such character and a graceful
indicative fallback facility built in.
 
I realize that the original proposal can be implemented with existing
technology, and that the changes I suggest would require changes to The
Unicode Standard and also possibly software packages, but perhaps not
necessarily, other than the software accepting U+FFF0 and U+FFF8 as being 
valid characters, but that could be done in time if there is the will to do
so, yet whatever solution is implemented is likely to be there for a very
long time.
 
Would those two changes both go a long way towards making a solution that is
acceptable to everybody?
 
I may not have solved every objection and what I suggest does change the
original. Yet this is research for the future. So, if people agree, please
say so, if not then please say what I have missed or got wrong and what
needs fixing and then, as a group effort, maybe we can iterate in a
constructive way and achieve a good solution acceptable to everybody.
 
William Overington
 
Thursday 4 June 2020