Re: Origin of the U+nnnn notation

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Nov 08 2005 - 09:44:27 CST

  • Next message: Philippe Verdy: "Re: Åland"

    From: "Hohberger, Clive" <CHohberger@zebra.com>

    > Adding to Philippe's excellent description, I think of the set {U+nnnn}
    > as a set of ordinal numbers, as they represent positions in a table. The
    > construct U-nnnn therefore is meaningless as an ordinal number.

    U+nnnn is not an ordinal number! Ordinals do not have 0th!
    But U+0000 is valid and is the *1st* element. Inherently, codepoints are
    also not fully ordered, and some are invalid, assigned to non-characters, or
    still undefined. Better think about an unordered set. Neither U-nnnn or
    U+[n..n]nnnn are ordinals. They are unique symbols used to designate code
    point elements where the UCS is mapped, nothing more.

    The set {U+0000..U+10FFFF} is not the UCS as well (because it is partially
    mapped to non-characters, and it contains code points that are still not
    assigned to characters in the UCS.)

    But the {U+0000..U+10FFFF} set is a closed set, with a known cardinality
    (unlike the UCS which is still an open set, and whose cardinality increases
    between versions). So the UCS is really an open collection of closed
    character sets, all of them being included in the higher UCS version.

    All of these closed versioned UCS sets are mapped injectively (not
    bijectively) to the same closed {U+0000..U+10FFFF} set of code points. The
    injection (mapping function) is not unique for all versions, but is unique
    for each individual version of the UCS.



    This archive was generated by hypermail 2.1.5 : Tue Nov 08 2005 - 09:46:39 CST