Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Fri Feb 28 2003 - 13:51:17 EST

  • Next message: Yung-Fong Tang: "Re: Unicode Arabic Rendering Problem"

    From: "Yung-Fong Tang" <ftang@netscape.com>

    > When you deal with encoding which need states (ISO-2022,
    ISO-2022-JP,
    > etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the
    > situration is different.

    Unicode cannot of course speak for those other encodings, but it can
    speak for UTF-8. There is a clear definition and it is up to the
    application what it wants to do with sequences deemed irregular or
    illegal. The decision is application dependent.

    EXAMPLE: In the latest versions of Windows, one can convert from UTF-8
    using MultiByteToWideChar. If one passes MB_ERR_INVALID_CHARS then
    such an errant string will cause the conversion to fail with an
    ERROR_NO_UNICODE_TRANSLATION error. If one does not pass the flag,
    then the conversion will simply strip the errant characters. Note that
    either solution meets the needs of refusal to interpret the errant
    sequences.

    What Netscape wants to do here in Mozilla or elsewhere can also be
    based on a decision within Netscape for the most appropriate behavior,
    given the definition.

    MichKa [MS]



    This archive was generated by hypermail 2.1.5 : Fri Feb 28 2003 - 14:35:18 EST