Re: UTF-16 inside UTF-8

From: [email protected]
Date: Tue Nov 04 2003 - 12:48:06 EST

  • Next message: [email protected]: "GSM and Unicode"

    In a message dated 11/4/2003 8:48:08 AM Pacific Standard Time,
    [email protected] writes:
    Serious answer: It should recognize that the text is *ill-formed* UTF-8
    (definition D30) and should probably decline to process the two code
    points.
    Agree
    If it wants to be more charitable than conformant, it MAY
    choose to reassemble them to create U+10300, but it is under no
    obligation to do so.
    hum... where is that part part come from. I though Unicode 3.2 and 4.0 is
    clear that seqnece is illegal. I don't think "it MAY choose to erassemble them to
    create U+10300" is in the Unicode standard. Actually, if you do so, it may
    create security hole in the application.

    ==================================
    Frank Yung-Fong Tang
    System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
    AIM:yungfongta mailto:[email protected] Tel:650-937-2913
    Yahoo! Msg: frankyungfongtan

    John 3:16 "For God so loved the world that he gave his one and only Son, that
    whoever believes in him shall not perish but have eternal life.

    Does your software display Thai language text correctly for Thailand users?
    -> Basic Conceptof Thai Language linked from Frank Tang's
    Iñtërnâtiônàlizætiøn Secrets
    Want to translate your English text to something Thailand users can
    understand ?
    -> Try English-to-Thai machine translation at
    http://c3po.links.nectec.or.th/parsit/



    This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 13:44:57 EST