Re: Re[2]: Should I laugh or cry?

From: Ienup Sung (ienup.sung@Eng.Sun.COM)
Date: Wed Dec 18 1996 - 21:02:18 EST


] From unicode@Unicode.ORG Wed Dec 18 16:12:43 1996
] Reply-To: Christopher.Vance@adfa.oz.au
]
] | The ANSI C has not suggested to use wchar_t. For Unicode UCS-2, we should use
] | a user-defined type. IBM ULS uses the unichar as the 16-bit unsigned short
] | for UCS-2. That should be the approach.
]
] I'll have to take your word on whether ANSI has been adopting changes
] made to the real C standard as they're made, but I do believe that
] ANSI abandoned its separate C standard in favour of the ISO edition,
] when it was adopted. ISO C most definitely does include wchar_t, and
] has done for a while, even if some national standards haven't caught
] up.
]
] If you're using non-standard or obsolete compilers, I can't help you.
] Standard headers include one for wchar_t (I think it's <wchar.h>,
] but my copy is buried somewhere).
]
] Then again, there's no guarantee which wide character set is used for
] wchar_t. Perhaps this is a locale issue?
]
] | wchar_t is not intended to be a published data type as char or byte.
]
] Excuse me? Says who? Since when? Citation, please.
]
] -- Christopher
]

Hello,

There is no standard API or data type defined and/or explicitly specified for
Unicode/UCS-2 unfortunately in terms of standard. I think this is one big
issue (and the reason why multi-platform software development cannot be
a easy task) that system vendors should try to solve by come up with
a single specification hopefully as like IBM once proposed with their ULS.

The wchar_t is an opaque data type (some says semi-opaque) that you
shouldn't assume on its representation. You can find various
UNIX spec and std sources saying that it is an opaque data type, for
instance, XPG4/4.2/5 that subsumes POSIX.1, ANSI C/ISO C, SVID3 and System V
ABI, ... The only things that you can be sure (if the OS you are dealing with
is XPG4-complient) with the wchar_t type is, there will 0 and PCS characters
(with same value that you can assign) exist in the wchar_t in terms of
code values of the type.

I know this not going to help but... for your information, in Solaris 2.6,
we are also going to provide two UTF-8 locales, en_US.UTF-8 and ko.UTF-8,
and since sizeof(wchar_t) == 4 in SunOS, we chose to support UCS-4 in Sun's
wchar_t. However, again, this is a (semi-)opaque data type that vendors can
choose/change the internal representation...

Ienup



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT