Re: Handling of Surrogates

From: Doug Ewell (doug@ewellic.org)
Date: Fri Apr 17 2009 - 07:16:20 CDT

  • Next message: Andreas Prilop: "Re: Dal and sad with 3 dots below"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > (that's why I think that it's simpler to use a separate datatype for
    > Unicode codepoints, independatly of the internal UTF used for them
    > (which may be UTF-8, UTF-16, or a compressed UTF like BOCU or CESU).

    I assume the last was supposed to be "SCSU". CESU-8 is not a
    compression format, but a hacked variant of UTF-8 that encodes UTF-16
    code units instead of code points.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 07:17:58 CDT