Re: Unicode conformant character encodings and us-ascii

From: Peter_Constable@sil.org
Date: Fri May 16 2003 - 15:48:51 EDT

  • Next message: Rick McGowan: "Open public review items..."

    Stefan Persson wrote on 05/16/2003 01:24:35 PM:

    > > These might be considered encoding forms, and they might be able to
    encode
    > > the Unicode coded character set, but I don't think these should be
    called
    > > "Unicode encoding forms". There are exactly three Unicode encoding
    forms:
    > > UTF-8, UTF-16 and UTF-32.
    >
    > Are not BE and LE regarded as different encoding forms, making five
    > encoding forms (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE & UTF-32LE)?

    No, you are thinking of character encoding *schemes*, of which there are
    seven: add to your list "UTF-16" and "UTF-32".

    I'll echo Addison's recommendation: read UTR#17 to explain the differences
    between the five levels of Unicode's character encoding model:

    abstract character repertoire
    coded character set
    character encoding form
    character encoding scheme
    transfer encoding syntax

    People might also look at Chapter 3 of TUS4.0, the final draft of which is
    online at http://www.unicode.org/book/preview/ch03.pdf. In particular,
    "encoding form" is defined as D29, "encoding scheme" is defined as D38, and
    the specific encoding forms and schemes *defined by Unicode* (take note,
    Philippe) are defined in the surrounding pages.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485



    This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 16:37:27 EDT