Re: 'code unit' and 'code point' meaning check

From: Ben Dougall (bend@freenet.co.uk)
Date: Wed May 14 2003 - 18:20:26 EDT

  • Next message: Kenneth Whistler: "RE: 'code unit' and 'code point' meaning check"

    On Wednesday, May 14, 2003, at 10:48 pm, Rick Cameron wrote:

    > You can find the new, improved definitions of code point and code unit
    > in
    > the online draft of Chapter 3 of TUS 4.0,
    > http://www.unicode.org/book/preview/ch03.pdf

    yeah, i'm really struggling with that at the moment. it just won't get
    into my head. :/

    > A code point is a number between 0 and 0x10ffff. It is independent of
    > the
    > encoding form.
    >
    > A code unit is the basic chunk of bits in one of the encoding forms of
    > Unicode - an 8-bit chunk in UTF-8, a 16-bit chunk in UTF-16 and a
    > 32-bit
    > chunk in UTF-32.

    right, so this..:

    >> a 'code unit' could be the same as a 'code point', but there again it
    >> might not be. it's possible that several 'code units' are required to
    >> make up a 'code point'? (so code units can be the same size or smaller
    >> than a code point, but not the other way round)?

    ..was a fair enough description by the looks of things. the right way
    round at least. (as opposed to my doubting follow up mail)

    ok, thanks.

    > (I'm sure this is an FAQ - but why are the code points 0xd800-0xdfff
    > not
    > considered noncharacters? Obviously no abstract character can be
    > associated
    > with them! Is there a different term that describes code points like
    > this?)

    <guess> that area is full of surrogates. so they need another code
    point to make up a single character. on their own 0xd800-0xdfff are 1/2
    characters :) </guess>



    This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 19:15:17 EDT