RE: My Querry

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Tue Nov 23 2004 - 12:35:35 CST

  • Next message: Addison Phillips [wM]: "RE: My Querry"

    > From: unicode-bounce@unicode.org
    > [mailto:unicode-bounce@unicode.org] On Behalf Of Harshal Trivedi
    > Sent: Tuesday, November 23, 2004 3:42 AM

    > How can i make sure that UTF-8 format string has terminated
    > while encoding it, as compared to C program string which ends
    > with '\0'
    > (NULL) character?
    >
    > -> Is there any special symbol or procedure to determine end of UTF-8
    > string OR just ASCII NULL '\0' is used as it is to indicate that.

            You can use the method used by C (often called "C strings" or "null
    terminated strings", in which a byte with value 0 signals the end of the
    string. However, as recently (and vigorously) discussed here, this transfer
    encoding scheme has the potentially problematic property of prohibiting use
    of the character at code point 0, NUL. This does not tend to be a problem
    for most uses, but one should still be aware of it.

            Another method is length encoded strings, as used by Java and MFC's
    CString class, where the length of the string data in bytes is encoded and
    presented first, and the bytes are handled opaquely.

            Either method will do, as there is no TES or data structure
    explicitly assigned to UTF-8. Use the one best suited for your application.
    You may want to read UTR 17, "Character Encoding Model", at
    http://www.unicode.org/reports/tr17/.

            HTH,

    /|/|ike

    "Tumbleweed E-mail Firewall <tumbleweed.com>" made the following
     annotations on 11/23/04 10:38:32
    ------------------------------------------------------------------------------
    This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.
    ==============================================================================



    This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 12:42:39 CST