C string literals with 16-bit Unicode

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Apr 30 2003 - 18:43:20 EDT

  • Next message: Christopher John Fynn: "Re: Private Use Area"

    Hi all, I am wondering how developers get 16-bit string *literals* into C source code. Do you use a
    mechanism other than the following?

    In the following, I use UChar as an example typedef name for the type of 16-bit Unicode strings
    (usually same as unsigned short).

    Escapes for non-ASCII characters would be ok. UTF-8/16 for the source code would be nicer. Whatever
    mechanism has to work on a non-ASCII platform, too.

    I am aware that there is an effort under way to add 16-bit Unicode string literals to the C
    standard; I am looking for what can be done today.

    I know of

    a) array of numeric constants
         const UChar string[]={ 0x61, 0x62, 0x20ac };

    b) array of numeric constants expressed as named constants
         enum { _a=0x61, _b, _c, ..., _Euro=0x20ac, ... };
         const UChar string[]={ _a, _b, _Euro };

    c) on some lucky platforms with 16-bit-Unicode wchar_t, simply
         const UChar *string=L"ab\x20ac";
       or even
         const UChar *string=L"ab€";

       -> but this is not portable

    d) using a preprocessor which takes source code like
         const UChar *string=U16LITERAL("ab\u20ac");
       or
         const UChar *string=U16LITERAL("ab€");
       and generates output C source code like a) or c) as appropriate

       -> Are there such preprocessors available?
          I guess Perl could do this...

    e) using a tool as in d) but only per-string for the developer,
        where one can type "ab€" and the tool generates output
        text like in a) to copy-paste into the .c file,
        possibly with a comment containing the original string

    I am *not* looking for ways to get strings via more high-level mechanisms and runtime functions like

    z1) not using string literals but resource bundles/message catalogs etc.

    z2) using an unescape function
         const UChar *string=unescape("ab\\u20ac");

    etc.

    Tips are greatly appreciated.

    markus



    This archive was generated by hypermail 2.1.5 : Wed Apr 30 2003 - 19:39:30 EDT