RE: Ill-formed sequences (was: Re: UTF-16 inside UTF-8)

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Wed Nov 05 2003 - 12:46:37 EST

  • Next message: John Hudson: "Re: elided base character or obliterated character (was: Hebrew composition model, with cantillation marks)"

    >
    > I assume that by “multiple UTF-8 sequences that could represent the same
    > logical text,” Adobe is referring to non-shortest UTF-8 sequences such
    > as <C0 80> and not to Unicode canonical equivalences or something else.
    > No similar warning about “multiple sequences” is given in the sections
    > that deal with UTF-16.
    >

    I am under the impression that they mean combining sequences.

    Addison P. Phillips
    Director, Globalization Architecture
    webMethods | Delivering Global Business Visibility
    http://www.webMethods.com
    Chair, W3C Internationalization (I18N) Working Group
    Chair, W3C-I18N-WG, Web Services Task Force
    http://www.w3.org/International

    Internationalization is an architecture.
    It is not a feature.



    This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 14:02:40 EST