Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

From: Philippe Verdy <>
Date: Sat, 2 Jun 2012 10:59:01 +0200

2012/6/2 William_J_G Overington <>:
> There is a paradox in that, at present, in order for a new electronic character-based communication technology to become introduced into regular Unicode that evidence of its existing widespread use in a Private Use Area context is needed: yet producing that existing widespread use in a Private Use Area context is both unrealistic because it would be a Private Use Area implementation and also that very supposed Private Use Area implementation would damage the implementation and use of a regular Unicode solution for many years.
> The point is that such new technologies need to be introduced in a process that is managed by Unicode and ISO Committees. For Unicode, the code points could be encoded by the Unicode Technical Committee yet the individual encodings using those code points could be carried out by another Unicode Committee, which particular committee being a matter to be decided.

The paradoc is apparently solved by having books printed and made
widely available (or old enough in history for its content to have
fallen in public domain and being freely reusable, so that copies
start spreading in various communities).

In old eras, you did not need characters to be encoded, you just drew
them with a brush, or sealed them in metal types or used a knife to
cut it in the wood.

Now we want new characters being encoded, but we are frozen by
copyright and political issues. But when new characters will be
introduced, font technologies may be used but only for limited spread
(widespread use requires converting plain text into rendered files,
such as PDF's or using embedded graphics in riche-text documents.

But now the UTC members are saying that these characters are not
necessary because they are graphic files. As if the need for use in
printed documents was still not necessary (even though those documents
are no longer graphic files, but full pages where all glyphs are
rendered the same way, and applications like OCR will shoke on unknown
glyphs found in those books).

So the good question is why do we need "plain-text" ? it is to allow
full search indexing and transformation of the content of those
documents, and to allow further works on those documents to create
derived docs more easily (even if the page layouts are largely
transformed), or for creating translations, or integration of those
"data" elements in other contexts.

Plain-text is just them: being able to extract a parsable semantic
from a rendered text. Initially, no books are plain-text, they are
alwys graphic. But they are still used as strong evidences for

I don't understand the discrimination between glyphs stored as
computer graphics, and printed books (or other real artistic
artistic/cultural works on various materials such as stone, wood and
ceramics) for a valid source and evidence for encoding.

Flags a good example, because they exist in various materialized
forms, not just in computer graphics ! And they are used in really a
lot of very different contexts. They are perfect candidate for
encoding, except that their colors cause problems with the encoding
model (mostly for the representative glyph), as well as their graphic
designs which are protected, restricted, or even forbidden of most
uses in some countries.

Bur just like other characters, or like languages, flags can also be
unified in their allowed variations, and still allow to encode
additiona variations (in Unicode we have variant selectors, in
languages codes, we have additional subtags, in ISO 3166 we have
subcodes as well that can be appended to existing codes). All these
unifications and encoded variations require a specific registry.

But unlike character variants, that remain basic glyphs, or coutry
codes, that remain codes composed of normal characters, flags are
unique by their color and only meaningful by their graphic design. A
solution for their unification will then require such a registry and a
convention for naming them in that registry.

We can start using modest codes, but given the huge number of existing
flags, and the fact that the UTC or the CLDR TC had no competence in
this domain (when other groups jave started collecting data many years
before Unicode ever started to work...), I will not suggest that the
Unicode consortium (and neither the WG2 at ISO) hosts this registry.
There's since long an established large fenderation of associations
that have provided researches, websites, and data (with large parts of
it freely available).

But the most frequent use of flags is still an a mall number of them.
And we can already itegrate them, with a model that will azlso allow
easy transition with other large collections of flags.

Please be pragmati here !

Admit that the "plain-text" need exists (even if it is still resolved
with some difficulties using embedded graphics that are not parsable
easily and not always interoperable due to their internal formats not
supported across all platforms, as well as due to their size, so that
thay cannot always be embedded).

Don't lie to yourself by saying that nobody has ever wanted them to be
encoded. Even if they are not encoded in Unicode, they are encoded
using various private-use schemes in lots of systems, but they are not
interoperable (and this is the main reason why you don't see them
expose in "strange" encodings, because those falgs are usually not
used exclsuively when creating large texts, but they are spread within
larger texts or data tables).
Received on Sat Jun 02 2012 - 04:02:09 CDT

This archive was generated by hypermail 2.2.0 : Sat Jun 02 2012 - 04:02:10 CDT