Re: Invalid code points

From: Mark Davis (mark.edward.davis@gmail.com)
Date: Sun May 31 2009 - 12:13:48 CDT

  • Next message: Phillips, Addison: "RE: Invalid code points"

    That section is incorrect. The relevant passages are

    D76 Unicode scalar value: Any Unicode code point except high-surrogate and
    low-surro-
    gate code points.
    • As a result of this definition, the set of Unicode scalar values consists
    of the
    ranges 0 to D7FF16 and E00016 to 10FFFF16, inclusive.

    and under D79, p100:

    To ensure that the mapping for a Unicode encoding form is one-to-one, all
    Unicode scalar values, including those corresponding to noncharacter code
    points and unassigned code points, must be mapped to unique code unit
    sequences. Note that this requirement does not extend to high-surrogate and
    low-surrogate code points, which are excluded by definition from the set of
    Unicode scalar values.

    Mark

    On Sun, May 31, 2009 at 08:55, Hans Aberg <haberg@math.su.se> wrote:

    > This quote say that it depends on how you read the standard which code
    > points are invalid; perhaps someone here can clarify :-):
    > http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points
    >
    > In particular, it would be great to know if the range U+0080, …, U+009F is
    > invalid.
    >
    > Hans Aberg
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 12:17:12 CDT