Re: Unicode conformant character encodings and us-ascii

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 16 2003 - 15:09:46 EDT

  • Next message: John Hudson: "Who's who"

    Stefan Persson asked:

    > Peter underscore Constable at sil dot org wrote:
    >
    > > These might be considered encoding forms, and they might be able to encode
    > > the Unicode coded character set, but I don't think these should be called
    > > "Unicode encoding forms". There are exactly three Unicode encoding forms:
    > > UTF-8, UTF-16 and UTF-32.
    >
    > Are not BE and LE regarded as different encoding forms, making five
    > encoding forms (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE & UTF-32LE)?

    No.

    The Unicode Standard has:

    One (1) coded character set (CCS).

    Three (3) encoding forms (CEF): UTF-8, UTF-16, UTF-32.

    Seven (7) encoding schemes (CES): UTF-8
                                       UTF-16, UTF-16BE, UTF-16LE
                                       UTF-32, UTF-32BE, UTF-32LE
                                       
    All the particulars are laid out publicly in excruciating
    detail at:

    http://www.unicode.org/book/preview/ch03.pdf

    People on this list should make a particular effort to familiarize
    themselves in Section 3.9 Unicode Encoding Forms and Section
    3.10 Unicode Encoding Schemes, before making claims about them.

    Any old explanations, including the text of The Unicode Standard,
    Version 3.0, have now been superseded by The Unicode Standard,
    Version 4.0 -- and that is why the editors put Chapter 3 up
    on the web for people to refer to.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 15:51:21 EDT