Re: Fw: Unicode & space in programming & l10n

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Sep 22 2006 - 15:52:13 CDT

  • Next message: Henrik Theiling: "Re: Fw: Unicode & space in programming & l10n"

    Steve Summit wrote on Friday, September 22, 2006 8:02 PM

    > Other than the (granted) incompatibility with string constants,
    > the rest of Philippe's assertions about unsigned int's alleged
    > unsuitability for manipulating Unicode characters in the BMP are
    > incorrect, but I'll spare the list a point-by-point rebuttal.

    Not quite. Unsigned int is only guaranteed a range of 0 to 0xffff and
    therefore it can't normalise the string <U+FAD5> - the normalised form is
    <U+25249> in all four normalisations. Of course, unsigned int is good
    enough to hold UTF-16 code *units*, which might just be what Mike meant.
    (I.e., the type supports UTF-16, but not UTF-32.)

    Of course, you may be able to create Unicode string constants - it all
    depends what data structure is used. FFFF-terminated arrays would work,
    e.g.

    static const unsigned int[] remark = {
             LATIN_L, LATIN_o, LATIN_o, LATIN_k, EXCLAMATION_MARK, 0xffff}

    Richard.



    This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 15:58:55 CDT