UTF-16 clarification needed

From: Jeroen Ruigrok van der Werven (asmodai@in-nomine.org)
Date: Fri Jul 04 2008 - 00:28:37 CDT

  • Next message: Curtis Clark: "Re: Capital Sharp S in the News"

    This has come up due to a debate on the Python development list and I could
    not immediately find a definite answer.

    When you have the U+D800 - U+DFFF range for creating code points using
    surrogate pairs and you take for example U+20045 it will be created as:
    U+D840 U+DC45. Are these, by themselves only code units or are they also
    code points?

    Personally I'm leaning to the code units only answer since you need this
    range's code units to build a valid code point. But history has proven
    enough times I can be quite mistaken, so I'd like some verification on this.

    Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
    イェルーン ラウフロック ヴァン デル ウェルヴェン
    http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
    For ever, brother, hail and farewell...

    This archive was generated by hypermail 2.1.5 : Fri Jul 04 2008 - 00:33:07 CDT