Re: The rules of encoding (from Re: Missing geometric shapes)

From: Philippe Verdy <>
Date: Sat, 10 Nov 2012 04:14:38 +0100

2012/11/9 Asmus Freytag <>:
> Actually, there are certain instances where characters are encoded based on
> expected usage.
> Currency symbols are a well known case for that, but there have been
> instances of phonetic characters encoded in order to facilitate creation and
> publication of certain databases for specialists, without burdening them
> with instant obsolescence (if they had used PUA characters).

But work is still being performed to implement the characters ans
start using it massively, even if it's not encoded. Currency symbols
are among these : their design does NOT need an initial encoding as a
character. This starts by a graphic design, using graphic tools. Then
these tools are used to design and print banknotes and coins.

Many documents will be preoduced to introduce the currency and its
expected symbol. They will use graphic representations rather than
plain text. Plain text however is expected to become an urgent need
for currencies that are to become legal tender in an area as large as
a country or group of countries, because currency units are used
everyday, many times each day, by lots of people, even if they don't
always need to create new documents with the symbol (in fact the first
use will be to name the currency, the symbol will be preprinted in
checkforms and on banknotes and coins, or on commercial advertizing
documents that are never limited to plain text : plain text is
certainly not the best support media for their announcements).

> If an important publisher of mathematical works (or publisher of important
> mathematical works) made a case for adding a recently created symbol so that
> they can go ahead an make it part of their standard repertoire, I would
> think it churlish to require them to create portability problems for their
> users by first creating documents with PUA encoding).

If the work is really important, if it because it has been the subject
of serious researches for a long enough time, and publication for peer
review. In scientific domains, most electronic publications are NOT
made in plain-text, but using PDFs. For computing purposes, if there's
a need to program the symbol, scientists are used to create specific
notations in programming languages. This is not a limitation, as such
programs actually don"t need the symbol themselves, except to render
the result (but softwares are not limited to return results in plain
text, so this is not a serious limitation).

In other words, there's no chicken and eggs problem for scientific
symbols: the usage starts expanding first, and at some time the symbol
will be used by enough people that they MAY want it to be supported in
plain-text (this won't always happen, notably for scientific documents
where plain text is already a very poor medium which require specific
conventions and notations that are extremely technical and not always
very readable and usable in practice, except by machines, like
programming code in computer languages).

Computer languages anyway are not in scope of the UCS. Neither is the
representation in mediums other than plain text (and notably not
graphic file formats).

In addition, the UCS is used in plain text to allow things that would
NOT be permitted in the initial definition (and actual usage) of the
symbol : transformations like changes of lettercase, sorting/collation
(which may not make sense for the notation using the symbol itself,
variability of glyphs, and even most character properties (the
classification in Unicode will make asbolutely no sense in the
scientific notation that certainly does not want this flexibility when
the actual notation has its own very precise requirements to be
meaningful in well-defined contexts).

Encoding the symbol in the UCS would immediately permit reuse in other
contexts than the initial one. It would be useful and acceptable in
fact to encode it ONLY if there are such derivation of usages, outside
of the initial scientific definition and context, by people that don't
even need to know the original meaning of the symbol to reuse it in a
'fancy" way to mean soething else. If scientists see this usage being
developped, where there's some unallowed variations, they will still
prefer to maintain their own precise definition, which won't match the
definition that will be encoded in the UCS for more general use (due
to the "unification" process prior to encoding).

In other words, for long the initial scientific commuity will continue
to use its existing definition and conventions, its own stadnards, and
the encoded character may create an ambiguity that does not exist in
their initial convention. They will reject the result of the
"unification" in the UCS and will still consider their symbol being
different from the currently encoded one.

At least for a long time until the general public starts recognizing
that their unified use is acdtually differerent, and then a request is
being performed by scientific people to desunify the scientific symbol
from the general purpose but ambiguous symbol, or until font designers
start listening and understanding the actual requirements of
scientists, in their glyph graphic design, and character properties
and processability (this part will fall in scope of Unicode, which may
adapt some algorithms or fix some properties, even if this breaks some
older general purpose usages when this was not correctly understood).
Here again we will be driven by asserting the actual usage before
encoding more specific variants.
Received on Fri Nov 09 2012 - 21:20:14 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 09 2012 - 21:20:21 CST