Re: Another Querry

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Nov 23 2004 - 23:49:13 CST

  • Next message: Antoine Leca: "Re: Another Querry"

    Harshal Trivedi <harshal dot trivedi at gmail dot com> wrote:

    > How can i determine end of UCS-2/UCS-4 string while encoding it in C
    > program?
    > Normal C string ends with '\0' - ASCII NULL as terminating
    > character.What symbol,pattern or character in UCS-2 or UCS-4
    > substitutes that ASCII NULL as termination symbol.

    You wouldn't normally use the ordinary C string type to encode a UTF-16
    (not UCS-2, please) or UCS-4 string. They're not meant for that, for
    exactly the reason your question implies: incidental zero-bytes will
    cause premature termination of the string, because almost all C
    implementations assume an 8-bit encoding.

    The solution is either to use UTF-8, or use "wide character" strings
    based on 16-bit (or, less likely, 32-bit) "character" units.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 23:51:58 CST