Re: UTF-8 and POSIX

From: Sandra O'donnell USG (
Date: Wed Jun 23 1999 - 14:18:48 EDT

   Is there any work going on to review the POSIX.1 and POSIX.2 standards
   systematically to add proper UTF-8 support?

This assumes .1 and .2 do not have proper UTF-8 support. I know quite
a few companies that are shipping products that support UTF-8 within
the POSIX framework.
   . . .
   Also the syntax for the entire locale database mechanisms was really
   designed for small 8-bit character sets and becomes rather horrible when
   applied to UTF-8. I get the impression that wchar_t <-> UTF-8 conversion
   is supposed to be done by table lookup of UTF-8 byte sequences as
   opposed to the obvious conversion algorithm.

Yes, the syntax can be large and kind of messy when used with large
code sets. But that's part of the nature of dealing with large things.
The collation and character property tables for Unicode are also large
and kind of messy unless you happen to know the syntax very well.

Also, why do you think the wchar_t <-> UTF-8 conversion is supposed
to be done by table lookup? The implementations I know of use the
obvious conversion algorithm.

   UTF-8 would certainly
   deserve some special treatment here as a recognized encoding in the
   locale system.

POSIX's design philosphy is that it is independent of encoding.
No encodings are ever mentioned, so there is no need to "recognize"
UTF-8 or anything else. You use the charmap to define how characters
are encoded, and then combine that with a locale definition source
file to build a locale.

I know some people prefer a code set dependent design. POSIX ain't it.

                -- Sandra
Sandra Martin O'Donnell
Compaq Computer Corporation

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT