Re: Unicode 7.0 goals and ++

From: Ken Whistler <>
Date: Mon, 11 Jul 2011 11:57:16 -0700

On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote:
> For the long term, I suggest Unicode should aim for this:
> Unicode 6.5 should claim: There will be a *Unicode dictionary*,
> limiting and reducing ambiguous semantics within Unicode
> (Background: e.g. the word "character" will have one single crisp
> definition, /or/ can be specified to & at any special point).

That kind of terminological purity isn't going to occur. The word
"character" has been
used ambiguously for decades in the IT industry, and has other general
language usage
as well.

The Unicode Consortium has a glossary of terms:

to help clarify technical term usage by the Unicode Standard and other
specifications, and
everybody is welcome to suggest improvements or additions to it.
Specific terms
of art in the Unicode Standard, such as "code point", "code unit",
"scalar value", etc.,
are used unambiguously. But it is basically hopeless to try to legislate
away linguistic
ambiguity in a term like "character".

> Unicode 7.0 should claim: The Unicode definitions will be in distinct,
> *abstract layers*.
> (Background: Unicode is not layered, multiple areas of knowledge mix.
> Just think of what the 7-layer OSI model has benefited the internet
> industry: separating the frequency from the packet from the byte from
> the character. There might be needed more dimensions, like for
> detailing normative from informative).

The Unicode Standard already has what abstract layers its architecture
makes appropriate. See, for example, glyph versus character, and
character encoding versus character encoding form versus character
encoding scheme.

But the Unicode Standard is neither a software system nor a protocol stack,
so trying to apply models appropriate to other realms probably isn't going
to get too far.

> Unicode 8.0 should claim: Static information will be defined and
> published in *XM*L.
> (Background: data, so think tables, lists, have one open standard
> structure).

This much is *already* available. See UAX #42, Unicode Character
Database in XML,
UTS #22, Character Mapping Markup Language, and UTS #35, Unicode Locale
Data Markup Language. The entire CLDR is expressed in XML already, as
is the entire Unicode Character Database.
> Unicode 9.0 should claim: Processes will be defined and published in
> *UML* 2.0 (for lack of an open standard)
> (Background: think UAX #9 Bidi written in a universal -graphic- language).

This, on the other hand, is not going to happen. The Unicode Standard
(and the
other specifications of the Unicode Consortium) is not an object-oriented
software system. Even trying to express the algorithmic specifications such
as the Unicode Normalization Algorithm, the Unicode Bidirectional Algorithm,
the Unicode Collation Algorithm, etc., in UML 2.0, would be a major waste of
effort, IMO. I don't see the UTC going for that at all.


> I might have the numbering wrong, or ever the sequence. But not the
> main line, is it?
> Ernest van den Boogaard
> 11-Jul-2011
Received on Mon Jul 11 2011 - 14:02:52 CDT

This archive was generated by hypermail 2.2.0 : Mon Jul 11 2011 - 14:02:53 CDT