Code point vs. scalar value (was: RE: Origin of Ellipsis (was: RE: Empty set))

From: Doug Ewell <doug_at_ewellic.org>
Date: Mon, 16 Sep 2013 13:41:39 -0700

Oh, for heaven's sake:

Code Point. (1) Any value in the Unicode codespace; that is, the range
of integers from 0 to 10FFFF₁₆. (See definition D10 in Section 3.4,
Characters and Encoding.) Not all code points are assigned to encoded
characters. See code point type. (2) A value, or position, for a
character, in any coded character set.

Unicode Scalar Value. Any Unicode code point except high-surrogate and
low-surrogate code points. In other words, the ranges of integers 0 to
D7FF₁₆ and E000₁₆ to 10FFFF₁₆ inclusive. (See definition D76
in Section 3.9, Unicode Encoding Forms.)

Source: http://www.unicode.org/glossary/

The only difference between a code point and a scalar value is that
"scalar value" excludes the integer values that correspond to
surrogates. That's it.

And since it is very unlikely that Twitter and others are storing and
interchanging loose surrogates, it is truly a distinction without a
difference.

This has nothing to do with UTF-Anything or Normalization Form Anything.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell ­
Received on Mon Sep 16 2013 - 15:43:34 CDT

This archive was generated by hypermail 2.2.0 : Mon Sep 16 2013 - 15:43:35 CDT