Re: The rules of encoding (from Re: Missing geometric shapes)

From: Asmus Freytag <>
Date: Thu, 08 Nov 2012 18:00:20 -0800

On 11/8/2012 4:39 PM, Mark E. Shoulson wrote:
> On 11/08/2012 01:48 AM, William_J_G Overington wrote:
>> Michael Everson <> wrote:
>> < ... collect examples of these in print ...
>> Mark E. Shoulson <> wrote:
>>> We don't encode "it would be nice/useful." We encode *characters*,
>>> glyphs that people use (yes, I know I conflated glyphs and
>>> characters there.)
>> ...
>>> Unicode isn't a system for encoding ratings. It's a system for
>>> encoding what people write and print.
>> I have at various times, as research has progressed, deposited with
>> the British Library pdf documents that I have produced and published
>> and I have deposited with the British Library TrueType fonts that I
>> have produced and published and I have received email receipts for them.
>> Some of the pdf publications contain new symbols, used intermixed
>> with text in a plain text situation. I have used Private Use Area
>> encodings for the symbols.
>> Yet the publications have not been published in hardcopy form.
> I think you may be taking me too literally. A PDF document which is
> essentially a proxy for a printed page (only cheaper to copy and
> produce) would count, to me, as usage "in print." I don't make the
> rules, but I think some of the Unicoders who do would agree. The
> charge of the rules being "out of date" because they demand usage is
> not an accurate one, and pointing to printing vs electronic usage is a
> red herring.
> I have long complained about another writing system which I felt had
> trouble being encoded due to chicken-and-egg issues (Klingon), but
> even so people have been using it in the PUA; see
> (now defunct, apparently, but the site is
> still there), and the KLI's collection of Qo'noS QonoS is available in
> Latin letters or in pIqaD in PUA.
> I agree that there is something to the charge of chicken-and-egg
> issues with encoding writing systems (you can't write it until it's
> encoded, you can't encode it until it's written), but probably more
> with the amount of usage that has to be seen, not with the requirement
> that there be SOME usage.
> I stand by it: we don't encode what would be cool to have. We encode
> what people *use*.

Actually, there are certain instances where characters are encoded based
on expected usage.

Currency symbols are a well known case for that, but there have been
instances of phonetic characters encoded in order to facilitate creation
and publication of certain databases for specialists, without burdening
them with instant obsolescence (if they had used PUA characters).

If an important publisher of mathematical works (or publisher of
important mathematical works) made a case for adding a recently created
symbol so that they can go ahead an make it part of their standard
repertoire, I would think it churlish to require them to create
portability problems for their users by first creating documents with
PUA encoding).

What these examples have in common is that they reflect a small number
of characters with an "instant" user community that's well defined and
understood (and appropriate to the type of character). The main reason
for the restriction to "encode what people use" is that characters
cannot be retracted if the hoped for enthusiasm for them doesn't

The other reason is that the Unicode Standard is a standard - what it
encodes needs to be worthy of standardization. There are exceptional
instances where "leading" standardization can be justified - they are
few and far between, but they exist. As exceptions prove the rule - the
majority of characters will continue to be cases where standardization
follows demonstrated use.

Received on Thu Nov 08 2012 - 20:01:28 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 08 2012 - 20:01:28 CST