RE: Multibyte definition

From: Marco.Cimarosti@icl.com
Date: Fri Mar 17 2000 - 05:13:42 EST


> You can, of course, put whatever you want into a wchar_t but,
> by convention, it tends to be restricted to UCS-2/UTF-16. If
> some application is using these types for something else, I'd
> be very suspicious indeed.

I see this as a gratuitous assumption.

C type 'wchar_t' is not for Unicode specifically. I don't remember having
seen the term "Unicode" on the ANSI C documentation I have seen, and I would
be surprised if the C++ is any different.

In C terms:

- "Byte": (1) the unit of measure for memory, as returned by operator
'sizeof'. Nothing more is implied, although 8 bits is a common size.

- "Type 'char'": an integer whose site is one "byte" (in C terms). Among
other things, it is guaranteed that its size is <= to the size of type
'wchar_t' ('sizeof(char) <= sizeof(wchar_t)' is always true; 'sizeof(char) <
sizeof(wchar_t)' is *not* always true).

- "Multibyte character": a multibyte string containing only one character
(in i18n terms), composed by one or more bytes.

- "Multibyte string": an array of type 'char' (e.g. 'char mbstr [10] =
"Ciao!"'). Nothing else is implied; the term "multibyte" is only a reminder
for the fact that array elements and characters don't necessarily have a
one-to-one correspondence.

- "Type 'wchar_t'": a type defined (among other places) in header "wchar.h".
Notice the difference with C++, where 'wchar_t' is a built-in type, not
defined anywhere. Type 'wchar_t' is guaranteed not to be smaller that type
'char'; no other assumptions are made about its size (although 16 and 32
bits are very common sizes).

- "Wide character": a value of type 'wchar_t' (e.g. 'wchar_t wchr = L'C').

- "Wide string": an array of type 'wchar_t' (e.g. 'wchar_t wstr [10] =
L"Ciao!"'). Nothing else is implied.

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT