# RE: Curious Definitions

From: Peter Constable (petercon@microsoft.com)
Date: Thu Jan 13 2005 - 11:09:00 CST

> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
On Behalf
> Of Arcane Jill

> First off, there are a couple of extremely LOGICAL definitions. For
example:
> (1) An "Abstract Character Sequence" is a sequence of "Abstract
Character"s.

This (D4) is a perfectly good definition (making reference to the term
"abstract character, defined in D3).

> This is exactly what I'd expect. But then there's this really
ILLOGICAL one:
> (3) A "Coded Character Sequence" is a sequence of ... (wait for it)
... "Code
> Points"...

This definition (D6) could be improved. Inasmuch as there is one-to-one
relationship between coded characters and code points and that the
character properties of a coded character are attributable to
interpreted code points, a sequence of coded characters is at least
equivalent and isomorphic to the corresponding code point sequence, even
if not identical. And the code point sequence is, indeed, a
representation of the coded character sequence. The only problem with D6
is the note saying that "coded character sequence" is identical.

> Is this important? I dunno, but a "Code Point" is defined as an
INTEGER in the
> range 0 to 0x10FFFF, wheras a "Coded Character" is defined as a
bidirectional
> mapping between a single "Abstract Character" and a single "Code
Point". So a
> "Coded Character" may be thought of as an ordered pair containing a
"Code
> Point" (an integer) and an "Abstract Character" (an atom of text),
wheras a
> "Code Point" is just an integer.
>
> So, logically, the sequence ( 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 )
is a
> "Coded Character Sequence", even though it's just the start of the
Fibonacci
> series?

If the numbers in that sequence are all interpreted as Unicode code
points, then it is, indeed, a coded character representation.

> The sequence ( 0xD800, 0xFFFF } is a Coded Character Sequence even
> though neither of its elements can be mapped to a coded character?

The definition seems to say so, doesn't it. Since it is a sequence of
code points, the status of the characters being represented doesn't
affect its validity, and such a valid sequence is something that is
useful to refer to. Whether "coded character sequence" is the best way
to refer to it is another question.

> This curious definition of "Coded Character Sequence" seems a bit
strange to
> me. Does it seem strange to anyone else? Have I misread something?

I agree you've spotted a weak point. How important is it? Well, I don't
think it's going to result in implementers doing the wrong thing, or not
knowing what to do at all; but it is good to eliminate terminological
issues where feasible. Can it be fixed? The terminology has been open to
refinement in every version so far, and I don't know of any reason what
that might change. Should it be changed? That's a decision for the
editorial committee to make, though you can certainly give them a
recommendation -- and suggested text.

Peter Constable

