Re: UCS-4, UCS-2, UTF-16, UTF-8

From: Kenneth Whistler ([email protected])
Date: Thu Feb 17 2000 - 19:56:24 EST

Next message: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
Previous message: [email protected]: "Re: Digraphs"
Maybe in reply to: ohmson ohmson: "UCS-4, UCS-2, UTF-16, UTF-8"
Next in thread: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Frank Tang wrote:

>
> Not only that. UCS-4 does not specify byte order, but UTF-32BE and
> UTF-32LE does. I think UTF-32 itself (not UTF-32BE neither UTF32-LE) does
> not make too much sense. But remember byte order is essential in network
> transmission.
>

In the context of the Unicode Character Encoding Model (see Unicode
Technical Report #17 http://www.unicode.org/unicode/reports/tr17)
UTF-32 is a Character Encoding Form. It is the mapping from the set
of integers used in the Unicode Standard (the scalar values) to 32-bit
code units (within a code space of 0..10FFFF). In the case of UTF-32,
the mapping is, of course, trivial: each scalar value maps to a single
32-bit code unit of the same numerical value.

UTF-32BE and UTF-32LE, on the other hand are Character Encoding Schemes --
they map the code units into serialized byte sequences.

None of these are *officially* part of the Unicode Standard yet -- they
are proposed as part of the *Draft* Unicode Technical Report #19. It
is likely, however, that they will soon become part of the Unicode
Standard.

When they do, the relationship between UTF-32, UTF-32BE, and UTF-32LE,
will be completely analogous to the relationship between UTF-16,
UTF-16BE, and UTF-16LE, as already specified in the standard. That
includes use and interpretation of the BOM (U+FEFF), which, of course,
in UTF-32 is U-0000FEFF.

--Ken

Next message: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
Previous message: [email protected]: "Re: Digraphs"
Maybe in reply to: ohmson ohmson: "UCS-4, UCS-2, UTF-16, UTF-8"
Next in thread: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT