RE: Curious Definitions

From: Peter Constable (petercon@microsoft.com)
Date: Thu Jan 13 2005 - 11:09:00 CST

  • Next message: Rupesh Shrestha: "CLDR entry"

    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    On Behalf
    > Of Arcane Jill

    > First off, there are a couple of extremely LOGICAL definitions. For
    example:
    > (1) An "Abstract Character Sequence" is a sequence of "Abstract
    Character"s.

    This (D4) is a perfectly good definition (making reference to the term
    "abstract character, defined in D3).

    > This is exactly what I'd expect. But then there's this really
    ILLOGICAL one:
    > (3) A "Coded Character Sequence" is a sequence of ... (wait for it)
    ... "Code
    > Points"...

    This definition (D6) could be improved. Inasmuch as there is one-to-one
    relationship between coded characters and code points and that the
    character properties of a coded character are attributable to
    interpreted code points, a sequence of coded characters is at least
    equivalent and isomorphic to the corresponding code point sequence, even
    if not identical. And the code point sequence is, indeed, a
    representation of the coded character sequence. The only problem with D6
    is the note saying that "coded character sequence" is identical.

    > Is this important? I dunno, but a "Code Point" is defined as an
    INTEGER in the
    > range 0 to 0x10FFFF, wheras a "Coded Character" is defined as a
    bidirectional
    > mapping between a single "Abstract Character" and a single "Code
    Point". So a
    > "Coded Character" may be thought of as an ordered pair containing a
    "Code
    > Point" (an integer) and an "Abstract Character" (an atom of text),
    wheras a
    > "Code Point" is just an integer.
    >
    > So, logically, the sequence ( 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 )
    is a
    > "Coded Character Sequence", even though it's just the start of the
    Fibonacci
    > series?

    If the numbers in that sequence are all interpreted as Unicode code
    points, then it is, indeed, a coded character representation.

    > The sequence ( 0xD800, 0xFFFF } is a Coded Character Sequence even
    > though neither of its elements can be mapped to a coded character?

    The definition seems to say so, doesn't it. Since it is a sequence of
    code points, the status of the characters being represented doesn't
    affect its validity, and such a valid sequence is something that is
    useful to refer to. Whether "coded character sequence" is the best way
    to refer to it is another question.

     
    > This curious definition of "Coded Character Sequence" seems a bit
    strange to
    > me. Does it seem strange to anyone else? Have I misread something?

    I agree you've spotted a weak point. How important is it? Well, I don't
    think it's going to result in implementers doing the wrong thing, or not
    knowing what to do at all; but it is good to eliminate terminological
    issues where feasible. Can it be fixed? The terminology has been open to
    refinement in every version so far, and I don't know of any reason what
    that might change. Should it be changed? That's a decision for the
    editorial committee to make, though you can certainly give them a
    recommendation -- and suggested text.

    Peter Constable



    This archive was generated by hypermail 2.1.5 : Thu Jan 13 2005 - 11:14:41 CST