Re: Unicode encoding policy

From: Asmus Freytag <>
Date: Mon, 29 Dec 2014 11:46:34 -0800

On 12/29/2014 10:32 AM, Doug Ewell wrote:
> Asmus Freytag wrote:
>> The "critical mass" of support is now assumed for currency symbols,
>> some special symbols like emoji, and should be granted to additional
>> types of symbols, punctuations and letters, whenever there is an
>> "authority" that controls normative orthography or notation.
>> Whether this is for an orthography reform in some country or addition
>> to the standard math symbols supported by AMS journals, such external
>> adoption can signify immediate "critical need" and "critical mass of
>> option" for the relevant characters.
> To me, it is remarkable that the "critical mass of support" argument
> that is applied, entirely appropriately, to new currency symbols
> (however misguided the motives for such might be) and math symbols and
> characters for people's names, is now also applied to BURRITO and
Does it - in principle - matter what a symbol is used for? If millions
of happy users choose to communicate by peppering their messages with
BURRITO and UNICORN FACE is that any less worthy of standardization than
if thousands (or hundreds) of linguists use some arcane letterform to
mark pronunciation differences between neighboring dialects on the
Scandinavian peninsula?

The "critical mass" argument does not (and should not) make value
judgements, but instead focus on whether the infrastructure exists to
make a character code widely available pretty much directly after
publication, and whether there is implicit or explicit demand that would
guarantee that such code is actually widely used the minute it comes

For currency symbols, or for a new letter form demanded by a new or
revised, but standard, orthography, the demand is created by some
"authority" creating a requirement for conforming users. Because of
that, the evaluation of the "critical mass" requirement is straightforward.

Emoji lack an "authority", but they do not lack demand. For better or
for worse, they have grabbed significant mind share; the number of news
reports, blogs, social media posts, shared videos and what not that were
devoted to Emoji simply dwarfs anything reported on currency symbols in
a comparative time frame. With tracking applications devoted to them,
anyone can convince themselves, in real time, that the entire repertoire
is being used, even, as appropriate for such a collection, with a clear
differentiation by frequency.

Nevertheless, the indication is clear that any emoji that will be added
by the relevant vendors is going to be used as soon as it comes
available. Further, as no vendor has a closed ecosystem, to be usable
requires agreement on how they are coded.

The critical question, and I fully understand that this gives you pause,
is one of selection. There are hundreds, if not thousands of potential
additions to the emoji collection, some fear the set is, in principle,
endless. Lacking an "authority" how does one come to a principled
agreement on encoding any emoji now, rather than later.

One would run an experiment, which is to say, create an alternate
environment where users can use non-standard emoji and then the
Uni-scientists in white lab coats could count the frequency of usage and
promote the cream off the top to standardized codes.

Or one could run an experiment where one defines a small number of
slots, say 40, and opens them up for public discussion, and proceeds on
that basis. Yes, that would turn the UTC into the "authority".

My personal take is that the former approach is inappropriate for
something that is in high demand and actively supported; the latter I
can accept, provisionally, as an experiment to try to deal with an
evolving system. Because of the ability to track, in real time, the use
or non-use of any of the new additions it would be a true experiment,
the outcome of which can be accurately measured. If it should lead to
the standardization of few dozen symbols that prove not as popular as
predicted, then we would conclude a failure of the experiment, and
retire this process. Otherwise, I'd have no problem cautiously
continuing with it.

> But then, I remember when folks used to cite the WG2 "Principles and
> Procedures" document for examples of what was and was not a good
> candidate for encoding. That seems so long ago now.

The P&P, like most by-laws and constitutions, are living documents. In
this case, they try to capture best practice, without taking from the
UTC (or WG2) the ability to deal with new or changed situations.

The degree to which emoji have captured the popular imagination is
unprecedented. It means the game has changed. Let's give the UTC the
space to work out appropriate coping mechanisms.


PS: this does not mean that, for all other types of code points, the
existing wording on the P&P can simply be disregarded. In fact, the end
result will be to see them updated with additional criteria explicitly
geared towards the kind of high-profile use case we are discussing here.
Unicode mailing list
Received on Mon Dec 29 2014 - 13:47:36 CST

This archive was generated by hypermail 2.2.0 : Mon Dec 29 2014 - 13:47:36 CST