Unicode Conference Abstracts

TR 19769: New Character Types in C
Ming Xu - SAP AG

Intended Audience: Software Engineers, Managers, Marketers, Site Coordinators, Systems Analysts, Technical Writers, Testers

Session Level: Beginner, Intermediate, Advanced

The C language has evolved over the last decades, various code pages and multi-byte libraries have been introduced, and extended character set support has also been introduced; however, the support for extended character data types in the C language was still limited. The introduction and the success of the Unicode standard and of its implementation in modern computer languages created strong demand on the C language to give Unicode better support.

ISO/IEC/SC22/WG14 has introduced 2 new extended character data types, char16_t and char32_t in order to give Unicode the optimum support in the C language. The Unicode standard supports 3 encoding forms: Each encoding form has advantages and disadvantages, so the choice of the encoding form should be left to the application. Char is suitable for UTF-8, char16_t for UTF-16 and CHAR32_t for UTF-32. C standards supports further an extended character type wchar_t. While the size of wchar_t is implementation defined and the data type merely supports a form of portability, char16_t and char32_t offer applications the support of portability in terms of the Unicode standard. The new data types guarantee program portability through clearly defined character widths.

We have addressed this subject in IUC 18 in Hong Kong. We suggested the C language extension in C committee as Germany on behalf of UTC. The new character types are now published by IETF officially. This speech covers the detail of C language extension and the situation of the implementation on the major platforms.

CLOSE WINDOW