Re: Origin of the U+nnnn notation

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Nov 08 2005 - 16:11:46 CST

Next message: Philippe Verdy: "Re: Origin of the U+nnnn notation"

Previous message: Hans Aberg: "Re: Origin of the U+nnnn notation"
Maybe in reply to: Jukka K. Korpela: "Origin of the U+nnnn notation"
Next in thread: Philippe Verdy: "U+nnnn notation and normative identifiers."
Reply: Philippe Verdy: "U+nnnn notation and normative identifiers."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On this topic...

> From: "Dominikus Scherkl" <lyratelle@gmx.de>

speculated:

> > Maybe it was thought of as an offset from the unit (character null)
> > like in ETA+5 minutes (expected time of arrival was passed five minutes
> > ago - an euphemism for beeing 5 minutes late).

Perhaps, but it had nothing to do with the actual origin of the "+".

And Philippe responded:

> U-nnnn already exists (or I should say, it has existed).

U+nnnn, actually. The U- notation was introduced by Amd 9 to 10646
in 1997. It was never adopted for any use with Unicode, per se.

> It was refering to
> 16-bit code units,

Code *points*, not code units. These were known as "Unicode values"
in Unicode prior to the introduction of UTF-16.

> not really to characters and was a fixed-width notation
> (with 4 hexadecimal digits). The "U" meant "Unicode" (1.0 and before).
>
> U+[n...n]nnnn was created to avoid the confusion with the past 16-bit only
> Unicode 1.0 standard (which was not fully compatible with ISO/IEC 10646 code
> points).

Actually, it was not to avoid confusion with Unicode 1.0. Unicode 1.1
was also 16-bit only, and it was fully compatible with 10646-1:1993.

> It is a variable-width notation that refers to ISO/IEC 10646 code
> points. The "U" means "UCS" or "Universal Character Set". At that time, the
> UCS code point range was up to 31 bits wide.
>
> The U-nnnn notation is abandoned now,

It isn't in widespread usage, but is still a normative specification
in 10646:2003.

> except for references to Unicode 1.0.

This is false. The "-" of the U- notation has nothing to do with
Unicode 1.0.

> If one uses it, it will refer to one or more 16-bit code units needed to
> encode each codepoint (possibly with surrogate pairs). It does not
> designates abstract characters or codepoints unambiguously.

This is false. The U- notation is for the 8-digit short identifiers
of 10646:2003. Those short identifiers designate code positions
(10646 term for code points) unambiguously. From 10646:

  "ISO/IEC 10646 defines short identifiers for each code position,
   including code positions that are reserved. A short identifier
   for any code position is distinct from a short identifier for
   any other code position. ..."

I'd say that's a pretty explicit claim that 10646 is talking about
code points *and* that the short identifiers are unambiguous.

> Later, the variable-width U+[n...n]nnnn notation was restricted to allow
> only codepoints in the 17 first planes of the joined ISO/IEC 10646-1 and
> Unicode standards (so the only standard codepoints are between U+0000 and
> U+10FFFF, some of them being permanently assigned to non-characters).

Correct. The current form of the specification is:

  "The four-to-six-digit form of short identifier shall consist
   of the last four to six digits of the eight-digit form. It is
   not defined if the eight-digit form is greater than 0010FFFF.
   Leading zeroes beyond four digits are suppressed."

> There are '''no''' negative codepoints in either standards (U-0001 does not
> designate the 32-bit code unit that you could store in a signed wide-char
> datatype, but in past standard it designated the same codepoint as U+0001
> now). Using "+" makes the statement about signs clear: standard code points
> all have positive values.

The "+" might connote that for some users, but its origin had nothing
to do with that.

--Ken

Next message: Philippe Verdy: "Re: Origin of the U+nnnn notation"
Previous message: Hans Aberg: "Re: Origin of the U+nnnn notation"
Maybe in reply to: Jukka K. Korpela: "Origin of the U+nnnn notation"
Next in thread: Philippe Verdy: "U+nnnn notation and normative identifiers."
Reply: Philippe Verdy: "U+nnnn notation and normative identifiers."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 08 2005 - 16:13:16 CST