Re: UTF-8 and POSIX

From: Sandra O'donnell USG (odonnell@zk3.dec.com)
Date: Wed Jun 23 1999 - 14:18:48 EDT

Next message: Kenneth Whistler: "Re: UTF-8 and POSIX"
Previous message: Keld J|rn Simonsen: "Re: UTF-8 and POSIX"
Maybe in reply to: Markus Kuhn: "UTF-8 and POSIX"
Next in thread: Kenneth Whistler: "Re: UTF-8 and POSIX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Is there any work going on to review the POSIX.1 and POSIX.2 standards
systematically to add proper UTF-8 support?

This assumes .1 and .2 do not have proper UTF-8 support. I know quite
a few companies that are shipping products that support UTF-8 within
the POSIX framework.

   . . .
   Also the syntax for the entire locale database mechanisms was really
   designed for small 8-bit character sets and becomes rather horrible when
   applied to UTF-8. I get the impression that wchar_t <-> UTF-8 conversion
   is supposed to be done by table lookup of UTF-8 byte sequences as
   opposed to the obvious conversion algorithm.

Yes, the syntax can be large and kind of messy when used with large
code sets. But that's part of the nature of dealing with large things.
The collation and character property tables for Unicode are also large
and kind of messy unless you happen to know the syntax very well.

Also, why do you think the wchar_t <-> UTF-8 conversion is supposed
to be done by table lookup? The implementations I know of use the
obvious conversion algorithm.

   UTF-8 would certainly
   deserve some special treatment here as a recognized encoding in the
   locale system.

POSIX's design philosphy is that it is independent of encoding.
No encodings are ever mentioned, so there is no need to "recognize"
UTF-8 or anything else. You use the charmap to define how characters
are encoded, and then combine that with a locale definition source
file to build a locale.

I know some people prefer a code set dependent design. POSIX ain't it.

-- Sandra
-----------------------
Sandra Martin O'Donnell
Compaq Computer Corporation
odonnell@zk3.dec.com
sandra.odonnell@compaq.com

Next message: Kenneth Whistler: "Re: UTF-8 and POSIX"
Previous message: Keld J|rn Simonsen: "Re: UTF-8 and POSIX"
Maybe in reply to: Markus Kuhn: "UTF-8 and POSIX"
Next in thread: Kenneth Whistler: "Re: UTF-8 and POSIX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT