RE: 'code unit' and 'code point' meaning check

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Wed May 14 2003 - 17:48:54 EDT

  • Next message: Mark Davis: "Re: 'code unit' and 'code point' meaning check"

    You can find the new, improved definitions of code point and code unit in
    the online draft of Chapter 3 of TUS 4.0,
    http://www.unicode.org/book/preview/ch03.pdf

    A code point is a number between 0 and 0x10ffff. It is independent of the
    encoding form.

    A code unit is the basic chunk of bits in one of the encoding forms of
    Unicode - an 8-bit chunk in UTF-8, a 16-bit chunk in UTF-16 and a 32-bit
    chunk in UTF-32.

    (I'm sure this is an FAQ - but why are the code points 0xd800-0xdfff not
    considered noncharacters? Obviously no abstract character can be associated
    with them! Is there a different term that describes code points like this?)

    - rick

    -----Original Message-----
    From: Ben Dougall [mailto:bend@freenet.co.uk]
    Sent: Wednesday, 14 May 2003 13:29
    To: unicode@unicode.org
    Subject: 'code unit' and 'code point' meaning check

    could someone confirm if i've got this correct, or not please?:

    a 'code unit' could be the same as a 'code point', but there again it
    might not be. it's possible that several 'code units' are required to
    make up a 'code point'? (so code units can be the same size or smaller
    than a code point, but not the other way round)?



    This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 18:40:48 EDT