Re: Fw: Unicode & space in programming & l10n

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Sep 22 2006 - 15:52:13 CDT

Next message: Henrik Theiling: "Re: Fw: Unicode & space in programming & l10n"

Previous message: Jukka K. Korpela: "Re: Problem with SSI and BOM"
In reply to: Steve Summit: "Re: Fw: Unicode & space in programming & l10n"
Next in thread: Mike: "Re: Fw: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Steve Summit wrote on Friday, September 22, 2006 8:02 PM

> Other than the (granted) incompatibility with string constants,
> the rest of Philippe's assertions about unsigned int's alleged
> unsuitability for manipulating Unicode characters in the BMP are
> incorrect, but I'll spare the list a point-by-point rebuttal.

Not quite. Unsigned int is only guaranteed a range of 0 to 0xffff and
therefore it can't normalise the string <U+FAD5> - the normalised form is
<U+25249> in all four normalisations. Of course, unsigned int is good
enough to hold UTF-16 code *units*, which might just be what Mike meant.
(I.e., the type supports UTF-16, but not UTF-32.)

Of course, you may be able to create Unicode string constants - it all
depends what data structure is used. FFFF-terminated arrays would work,
e.g.

static const unsigned int[] remark = {
LATIN_L, LATIN_o, LATIN_o, LATIN_k, EXCLAMATION_MARK, 0xffff}

Richard.

Next message: Henrik Theiling: "Re: Fw: Unicode & space in programming & l10n"
Previous message: Jukka K. Korpela: "Re: Problem with SSI and BOM"
In reply to: Steve Summit: "Re: Fw: Unicode & space in programming & l10n"
Next in thread: Mike: "Re: Fw: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 15:58:55 CDT