Re: Fw: Unicode & space in programming & l10n

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Sep 22 2006 - 19:08:25 CDT

  • Next message: Kenneth Whistler: "Re: Fw: Unicode & space in programming & l10n"

    Kenneth Whistler wrote on Friday, September 22, 2006 11:09 PM

    >> Unsigned int is only guaranteed a range of 0 to 0xffff and
    >> therefore it can't normalise the string <U+FAD5> - the normalised form is
    >> <U+25249> in all four normalisations.
    >
    > It *can*, if you abstract your type definitions correctly.

    >> Of course, unsigned int is good
    >> enough to hold UTF-16 code *units*, which might just be what Mike meant.
    >> (I.e., the type supports UTF-16, but not UTF-32.)

    > It is perfectly fine for UTF-32, if you do this correctly.

    I.e. avoid compilers where plain int is only 16 bits. They're certainly
    valid under the 1990 standard.

    > ...

    > At that point, you can safely port your entire code to *any*
    > platform, with at most one compiler-specific #ifdef in your
    > fundamental header file.

    That is true, but you would no longer necessarily be using 'unsigned int'
    for UTF-32.

    You could use somthing like:

    #include <limits.h>

    #if UINT_MAX >= 0X10FFFF
        typedef unsigned int utf32char;
    #else
        typedef unsigned long int utf32char;
    #endif

    Or did you count this as compiler-specific?

    > And if you need to use arbitrary
    > buffers of Unicode character data, including embedded NULLs
    > and noncharacters, then you are better off using separate tracking
    > of buffer length, anyway.

    And you need to be able to include embedded nulls to pass some of the
    Unicode conformance tests. I know, because that tripped me up in the past.

    Steve Summit wrote on Saturday, September 23, 2006 12:00 AM

    > Ken Whistler wrote:

    >> It is perfectly fine for UTF-32, if you do this correctly.
    >> For example:
    >>
    >> typedef unsigned short UShort16;
    >> typedef unsigned int UInt32;
    >>
    >> typedef UShort16 utf16char;
    >> typedef UInt32 utf32char;
    >
    > Please don't do this! Please do
    >
    > #include <stdint.h>
    >
    > typedef uint16_t utf16char;
    > typedef uint32_t utf32char;
    >
    > instead.
    >
    >> At that point, you can safely port your entire code to *any*
    >> platform, with at most one compiler-specific #ifdef in your
    >> fundamental header file.

    It will only work if your compiler acknowledges the C99 standard. The ones
    I use don't claim to comply, and the one I use at home would simply fail to
    compile the above.

    Richard.



    This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 19:12:28 CDT