Re: UTF-8, ISO C Am.1, and POSIX

From: odonnell@zk3.dec.com
Date: Wed Aug 13 1997 - 11:30:39 EDT

Next message: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Previous message: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Maybe in reply to: Markus G. Kuhn: "UTF-8, ISO C Am.1, and POSIX"
Next in thread: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>POSIX doesn't include information about any specific encoding,
>UTF-8 or otherwise. . .
> Yes, yes, I know UTF-8 and Unicode/UCS are universal
>encodings, but from POSIX's point of view, that's irrelevant.

   That's just what's wrong with POSIX from the perspective of an implementer
   of the Unicode Standard.

If you want to write code set DEpendent software, POSIX
definitely won't give you any help. It has a completely
different design philosophy than does Unicode-specific
software. There are pros and cons to each.

   Unicode has well defined character semantics that
   are considered a property of the character itself and therefore not locale
   dependent. A shorthand notation to kick the standard library into supporting
   these is indeed called for. . .

I like the fact that Unicode defines character semantics and
that it considers such semantics to be properties of the
character regardless of locale. IMO, there isn't that much
advantage to POSIX's ability to make character semantics
locale-dependent.

However, POSIX's defined behavior has been in place for a
long time, and is based at least in part on what users and
companies thought was the correct behavior. I remember arguing...
I mean, debating :-)...with non-i18n engineers five or six
years ago about what should be defined as "alpha" characters
in the en_US locale. The locale was built with ISO 8859-1, and
I thought "alpha" should include some or all of the characters
with diacritics in the Latin-1 repertoire. The reaction was
mass consternation and hysteria. No, everyone "knew" the
American locale only included English A-Z and a-z; code
depended on that behavior.

Standards are asked to support different behavior and philosophies.
You want something that makes Unicode pre-eminent. This being the
Unicode mailing list, there probably are lots of others who agree.
But there are others out there who need/want to support other
encodings, and a code set independent design like POSIX meets
their needs.

-----------------------
Sandra Martin O'Donnell
odonnell@zk3.dec.com

Next message: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Previous message: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Maybe in reply to: Markus G. Kuhn: "UTF-8, ISO C Am.1, and POSIX"
Next in thread: Keld J|rn Simonsen: "Re: UTF-8, ISO C Am.1, and POSIX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT