Re: Factor implements 24-bit string type for Unicode support

From: Markus Scherer (markus.icu@gmail.com)
Date: Mon Feb 04 2008 - 11:47:42 CST

  • Next message: Hans Aberg: "Re: Factor implements 24-bit string type for Unicode support"

    Most Unicode software and libraries use UTF-16 internally, which is easy to use.
    Some use UTF-8 even internally, if they see a large majority of
    high-volume text in ASCII.
    UTF-32 as a string encoding is rare. (Some people call single-code
    point integers "in UTF-32".)

    Roll your own encoding form, and you can't use any existing libraries... Why?

    markus

    On Feb 4, 2008 5:49 AM, Hans Aberg <haberg@math.su.se> wrote:
    > I think that 32-bit is probably best for internal use in programs for
    > speed, avoiding alignment problems; the best way to actually know is
    > to do some profiling. Externally, for distributed files, UTF-8 seems
    > best, because most agree on how to sort out the bits the bytes.

    -- 
    Opinions expressed here may not reflect my company's positions unless
    otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Mon Feb 04 2008 - 11:51:25 CST