Is there any work going on to review the POSIX.1 and POSIX.2 standards
systematically to add proper UTF-8 support?
This assumes .1 and .2 do not have proper UTF-8 support. I know quite
a few companies that are shipping products that support UTF-8 within
the POSIX framework.
. . .
Also the syntax for the entire locale database mechanisms was really
designed for small 8-bit character sets and becomes rather horrible when
applied to UTF-8. I get the impression that wchar_t <-> UTF-8 conversion
is supposed to be done by table lookup of UTF-8 byte sequences as
opposed to the obvious conversion algorithm.
Yes, the syntax can be large and kind of messy when used with large
code sets. But that's part of the nature of dealing with large things.
The collation and character property tables for Unicode are also large
and kind of messy unless you happen to know the syntax very well.
Also, why do you think the wchar_t <-> UTF-8 conversion is supposed
to be done by table lookup? The implementations I know of use the
obvious conversion algorithm.
UTF-8 would certainly
deserve some special treatment here as a recognized encoding in the
POSIX's design philosphy is that it is independent of encoding.
No encodings are ever mentioned, so there is no need to "recognize"
UTF-8 or anything else. You use the charmap to define how characters
are encoded, and then combine that with a locale definition source
file to build a locale.
I know some people prefer a code set dependent design. POSIX ain't it.
Sandra Martin O'Donnell
Compaq Computer Corporation
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT