Re: UNICODE version of _T(x) macro

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 22 2010 - 13:34:29 CST

Next message: Asmus Freytag: "Re: UNICODE version of _T(x) macro"

Previous message: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Maybe in reply to: sowmya satyanarayana: "UNICODE version of _T(x) macro"
Next in thread: Bjoern Hoehrmann: "Re: UNICODE version of _T(x) macro"
Reply: Bjoern Hoehrmann: "Re: UNICODE version of _T(x) macro"
Reply: Konstantin Ritt: "Re: UNICODE version of _T(x) macro"
Reply: Ed: "Re: UNICODE version of _T(x) macro"
Reply: Martin v. Löwis: "Re: UNICODE version of _T(x) macro"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Somya asked:

> I have unicode C application. I am using the following macro
> to define my string
> to 2 byte width characters.
>
> #ifdef UNICODE
> #define _T(x) L##x
>
> But I see that GCC compiler maps 'L' to wchar_t, which is 4 byte on Linux. I
> have used -fshort-wchar option
> on Linux but I want my application to be portable on AIX as
> well, which does not
> have this option. I am not able
> to findbest way to define _T(x) of UNICODE version, which takes 2 byte wide
> character always.

> Taking this, what is the best way to define _T(x) macro of UNICODE version, so
> that my strings will always be
> 2 byte wide character?

Well, some may disagree with me, but my first advice would be
to avoid macros like that altogether. And second, to absolutely
avoid any use of wchar_t in the context of processing Unicode
characters and strings.

If you are working with C compilers that support the C99 standard,
you can instead make use of the stdint.h exact-width integer
types. And then you should *typedef* Unicode code unit types
to those exact-width integer types.

uint8_t <-- typedef your UTF-8 code unit type to this

uint16_t <-- typedef your UTF-16 code unit type to this

uint32_t <-- typedef your UTF-32 code unit type to this

See:

http://en.wikipedia.org/wiki/Stdint.h

If you need to cross-compile on platforms that don't support
the C99 types, then you can probably get away with:

unsigned char

unsigned short

unsigned int

which should normally resolve to 8-bit, 16-bit, and 32-bit
types, respectively, on all platforms.

Once you have your 3 fixed-width code unit typedefs in hand,
do all of your Unicode character and string processing using
those types.

When you are making use of other Unicode libraries, the libraries
often have these typedefs already defined for you. Thus, for
example, ICU has typedefs for UChar (an unsigned 16-bit integer)
and UChar32 (as a signed 32-bit integer). [The choice between
a signed or unsigned 32-bit integer has to do with library
design choices, but in all cases the valid 32-bit values
for Unicode characters are in the positive range 0..0x10FFFF.]

See:

http://userguide.icu-project.org/strings

Once you have your code set up to use typedefs like this for
your Unicode characters and strings, read, understand, and
follow the rules for the UTF-8, UTF-16, and UTF-32 encoding
forms, as documented in Section 3.9, Unicode Encoding Forms,
of the Unicode Standard:

http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf

and your Unicode string handling should then be correct
and conformant.

--Ken

Next message: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Previous message: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Maybe in reply to: sowmya satyanarayana: "UNICODE version of _T(x) macro"
Next in thread: Bjoern Hoehrmann: "Re: UNICODE version of _T(x) macro"
Reply: Bjoern Hoehrmann: "Re: UNICODE version of _T(x) macro"
Reply: Konstantin Ritt: "Re: UNICODE version of _T(x) macro"
Reply: Ed: "Re: UNICODE version of _T(x) macro"
Reply: Martin v. Löwis: "Re: UNICODE version of _T(x) macro"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 13:37:05 CST