Re: Code point vs. scalar value from Stephan Stiller on 2013-09-18 (Unicode Mail List Archive)

From: Stephan Stiller <stephan.stiller_at_gmail.com>
Date: Wed, 18 Sep 2013 00:02:00 -0700

On 9/17/2013 10:54 PM, Asmus Freytag wrote:
> On 9/17/2013 8:40 PM, Philippe Verdy wrote:
>>
>> In what way does UTF-16 "use" surrogate code /points/? An
>> encoding form is a mapping. Let's look at this mapping:
>>
>> * One _inputs_ scalar values (not surrogate code points).
>>
>> In fact the input is one code point.
>>
>> Then only if that code point has a scalar value (this may be tested
>> or not by the application), the rest of the algorithm applies.
>
> Thanks for providing some needed clarity.
No:

1. According to the Glossary, an encoding form maps scalar values to
    sequences of code units. Therefore, such an input validity check
    isn't part of the encoding form, or at least not the encoding form
    proper.
2. That still doesn't mean surrogates are "used by UTF-16", like the
    Glossary claims. The validity check you're quoting from Philippe's
    message would (if performed) be equally relevant to all encoding
    forms; thus it wouldn't be UTF-16-specific.

Stephan
Received on Wed Sep 18 2013 - 02:04:27 CDT

This archive was generated by hypermail 2.2.0 : Wed Sep 18 2013 - 02:04:29 CDT