Logo Draft Unicode Technical Report #19


Revision 5.0
Authors Mark Davis (mark.davis@us.ibm.com)
Date 1999-11-16
This Version http://www.unicode.org/unicode/reports/tr19/tr19-5.html
Previous Version http://www.unicode.org/unicode/reports/tr19/tr19-4.html
Latest Version http://www.unicode.org/unicode/reports/tr19


This document specifies an alias that can be used to refer to the subset of UCS-4 values that are valid Unicode code points.


The preferred encoding form for Unicode text is the 16-bit form, UTF-16. There is also an 8-bit encoding form called UTF-8 that can be used to represent Unicode in environments where the 16-bit form is impractical due to compatibility constraints. In addition, some implementations may wish to use a 32-bit form, where each Unicode code point (aka scalar value) corresponds to a single 32-bit unit. Even those applications that do not use this form may want to convert to and from it for interoperability.

Such an encoding form is defined in ISO/IEC 10646, and called UCS-4. However, UCS-4 permits values that are not in the range of valid Unicode code points. The term UTF-32 can be used to refer to the subset of UCS-4 characters that are in the range of valid Unicode code points. The following lists the important features of this encoding form:

Since UTF-32 is simply a subset of UCS-4 characters, it is conformant to ISO/IEC 10646 as well as to the Unicode Standard.

