RE: UNICODE version of _T(x) macro

From: Phillips, Addison (addison@lab126.com)
Date: Mon Nov 22 2010 - 12:18:02 CST

Next message: Asmus Freytag: "Re: Are Latin and Cyrillic essentially the same script?"

Previous message: Doug Ewell: "RE: UNICODE version of _T(x) macro"
In reply to: Doug Ewell: "RE: UNICODE version of _T(x) macro"
Next in thread: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Reply: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>
> sowmya satyanarayana <sowmya underscore satyanarayana at yahoo dot
> com>
> wrote:
>
> > Taking this, what is the best way to define _T(x) macro of
> UNICODE version, so
> > that my strings will always be
> > 2 byte wide character?
>
> Unicode characters aren't always 2 bytes wide. Characters with
> values
> of U+10000 and greater take two UTF-16 code units, and are thus 4
> bytes
> wide in UTF-16.
>

Not exactly. The code units for UTF-16 are always 16-bits wide. Supplementary characters (those with code points >= U+10000) use a surrogate pair, which are two 16-bit code units. Most processing and string traversal is in terms of the 16-bit code units, with a special case for the surrogate pairs.

It is very useful when discussing Unicode character encoding forms to distinguish between characters ("code points") and their in memory representation ("code units"), rather than using non-specific terminology such as "character".

If you want to use UTF-32, which uses 32-bit code units, one per code point, you can use a 32-bit data type instead. Those are always 4 bytes wide.

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N, IETF IRI WGs)

Internationalization is not a feature.
It is an architecture.

Next message: Asmus Freytag: "Re: Are Latin and Cyrillic essentially the same script?"
Previous message: Doug Ewell: "RE: UNICODE version of _T(x) macro"
In reply to: Doug Ewell: "RE: UNICODE version of _T(x) macro"
Next in thread: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Reply: Asmus Freytag: "Re: UNICODE version of _T(x) macro"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 12:20:01 CST