Re: UTF-8, ISO C Am.1, and POSIX

From: odonnell@zk3.dec.com
Date: Tue Aug 12 1997 - 15:27:24 EDT

Next message: John Cowan: "SGML DESCSET for XML, HTML (was: XML and ISO 10646 ...)"
Previous message: : "Re: XML and ISO 10646 planes beyond the BMP"
Maybe in reply to: Markus G. Kuhn: "UTF-8, ISO C Am.1, and POSIX"
Next in thread: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> We have in the ISO POSIX WG been thru all POSIX standards to see
> what changes we should do to the standards to accompdate UCS.

   Markus Kuhn wrote:
   I guess, pretty much the only thing required in the POSIX standard for UTF-8
   is a standardized way to tell the locale mechanism that the character encoding
   used is UTF-8. UTF-8 is a little bit more than yet another character
   table, so there should be some locale flag or something like this that
   allows me to tell libc that UTF-8 is the used encoding.

The original question was what changes, if any, are needed in
POSIX to accommodate UCS. There aren't any that I can think of,
if we assume an implementation is using UTF-8 as the multibyte
external code and UCS as an internal wide character format.
Given that, there's no reason POSIX needs a flag or anything
else to make it aware it's using UTF-8. POSIX is designed to
be code set independent.

   . . .
   What's the state of the standardization with regard to specifying in a
   locale that we use UTF-8? How does enUS.UTF-8 look like?

Different from what most other implementations are using. Using
the values in your example, most would write this as en_US.UTF-8.

   It might also be useful, if POSIX would clairfy, how all the new
   ISO C Am. 1 functions for wide streams and multi-byte strings work in
   detail if we have selected the UTF-8 encoding in the locale. . .

POSIX doesn't include information about any specific encoding,
UTF-8 or otherwise. It is designed to work with a variety of
encodings, so it doesn't make sense for it to include specific
details of how it might work with a UTF-8-based locale anymore
than it would make sense for it to include details of how it
might work with an ISO 8859-1-based locale or a Japanese EUC-based
locale. Yes, yes, I know UTF-8 and Unicode/UCS are universal
encodings, but from POSIX's point of view, that's irrelevant.
They're just encodings.

-----------------------
Sandra Martin O'Donnell
odonnell@zk3.dec.com

Next message: John Cowan: "SGML DESCSET for XML, HTML (was: XML and ISO 10646 ...)"
Previous message: : "Re: XML and ISO 10646 planes beyond the BMP"
Maybe in reply to: Markus G. Kuhn: "UTF-8, ISO C Am.1, and POSIX"
Next in thread: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT