RE: UTF-16 clarification needed

From: Phillips, Addison (addison@amazon.com)
Date: Fri Jul 04 2008 - 10:31:43 CDT

  • Next message: Philippe Verdy: "RE: how to add all latin (and greek) subscripts"

    See Section 3.8 in the standard:

      http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G2212

    In my experience, it is a lot clearer to folks if you do not refer to surrogate code points as anything other than reserved. UTF-16 uses code units to encode Unicode code points.

    Formally, the code points in Unicode run from 0 through 0x10FFFF, so the surrogate code points are code points. However the code points between D800 and DFFF are reserved and do not encode characters. Section 3.9 says:

    "Each encoding form maps the Unicode code points U+0000..U+D7FF and
    U+E000..U+10FFFF to unique code unit sequences."

    So, the surrogate pair (of code units) encodes a code point (U+20045 in your example).

    Addison

    Addison Phillips
    Globalization Architect -- Lab126

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On Behalf Of Jeroen Ruigrok van der Werven
    > Sent: Friday, July 04, 2008 12:09 AM
    > To: Doug Ewell
    > Cc: Unicode Mailing List
    > Subject: Re: UTF-16 clarification needed
    >
    > -On [20080704 08:47], Doug Ewell (dewell@roadrunner.com) wrote:
    > >They are both UTF-16 code units and code points. They are not
    > Unicode
    > >scalar values.
    >
    > OK, and when you have them together in a surrogate pair, do you
    > call it a
    > pair of code units or can you also call them a pair of code points?
    >
    > --
    > Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> /
    > asmodai
    > イェルーン ラウフロック ヴァン デル ウェルヴェン
    > http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
    > A wise man that walks in the dark with a blindfold on, is not much
    > of a
    > wise man...



    This archive was generated by hypermail 2.1.5 : Fri Jul 04 2008 - 10:34:39 CDT