Re: 'code unit' and 'code point' meaning check

From: Doug Ewell ([email protected])
Date: Fri May 16 2003 - 03:11:22 EDT

Next message: William Overington: "Re: 'code unit' and 'code point' meaning check"

Previous message: Doug Ewell: "Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
In reply to: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"
Next in thread: [email protected]: "Re: 'code unit' and 'code point' meaning check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

My day to pick on Philippe Verdy <verdy_p at wanadoo dot fr>:

>> In a nutshell: Unicode is not UTF-16.
>
> Or in other words, Unicode defines *code points* only, not code units
> (this is left to specific encodings used to serialize it, including
> UTF-*, and "compressed" BOCU and CESU encodings, which can all be
> computed algorithmically from Unicode code points).

Unicode defines the encoding forms, and thus the code units used by
those encoding forms. If Philippe simply means that the code units used
to represent a given code point vary depending on the chosen encoding
form, he is of course right.

Note that there is a bit of confusion here between encoding forms, which
are about code units, and encoding schemes, which are about bytes. (I
had a lot of trouble separating these two, at first.) Also, replace
"CESU" with "SCSU" in this passage.

> Note that some UTF-* encodings are now described by Unicode.org as
> standards, but is technically an annex to the standard, and not
> necessary to its definition.

As Michka pointed out, Unicode Standard Annexes *are* part of the
Unicode Standard. But this is moot, since all three UTF's are defined
directly in the standard itself, not in UAX's (although UTF-32 used to
be).

This has nothing to do with whether Unicode conformance requires
implementation of any particular UTF. (It does not.)

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: William Overington: "Re: 'code unit' and 'code point' meaning check"
Previous message: Doug Ewell: "Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
In reply to: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"
Next in thread: [email protected]: "Re: 'code unit' and 'code point' meaning check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 03:57:02 EDT