Re: 'code unit' and 'code point' meaning check

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Thu May 15 2003 - 12:29:59 EDT

Next message: Otto Stolz: "8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"

Previous message: Philippe Verdy: "Proposing NCC and NCD (Normalized Collation (De)Composition) forms aligned with UCA"
In reply to: Ben Dougall: "Re: 'code unit' and 'code point' meaning check"
Next in thread: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"
Reply: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ben Dougall wrote:

> <guess> that area is full of surrogates. so they need another code point
> to make up a single character. on their own 0xd800-0xdfff are 1/2
> characters :) </guess>

Oh, no! Again, you are confusing code-points and code-units
(in other words: Unicode and its UTFs).

Code points:

- In Unicode, a surrogate code-point is not assigned any character.
Hence, these code-points are illegal, hence none of these can be
contained in actual (legal) data.

Code units:

- In UTF-8, there is no such thing as a surrogate code-unit,
as the code units are only 8 bits wide.

- In UTF-16, a pair of surrogate code-units encodes a character
beyond the BMP (and a non-surrogate code-unit encodes a character
in the BMP).

- In UTF-32, a surrogate code-unit is illegal, as it would
encode an illegal surrogate code-point.

In a nutshell: Unicode is not UTF-16.

Best wishes,
Otto Stolz

Next message: Otto Stolz: "8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Previous message: Philippe Verdy: "Proposing NCC and NCD (Normalized Collation (De)Composition) forms aligned with UCA"
In reply to: Ben Dougall: "Re: 'code unit' and 'code point' meaning check"
Next in thread: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"
Reply: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 13:31:25 EDT