Re: Nicest UTF

From: Philippe Verdy (
Date: Thu Dec 02 2004 - 17:14:11 CST

  • Next message: Richard Cook: "Re: current version of unicode-font"

    If you need immutable strings, that take the least space as possible in
    memory for your running app, then consider using SCSU, for the internal
    storage of the string object, then have a method return an indexed array of
    code points, or a UTF-32 string when you need it to mutate the string object
    into another.

    SCSU is excellent for immutable strings, and is a *very* tiny overhead above
    ISO-8859-1 (note that the conversion from ISO-8859-1 to SCSU is extremely
    trivial, may be even simpler than to UTF-8!)

    From: "Marcin 'Qrczak' Kowalczyk" <>
    > For internals of my language Kogut I've chosen a mixture of ISO-8859-1
    > and UTF-32. Normalized, i.e. a string with chracters which fit in
    > narrow characters is always stored in the narrow form.
    > I've chosen representations with fixed size code points because
    > nothing beats the simplicity of accessing characters by index, and the
    > most natural thing to index by is a code point.
    > Strings are immutable, so there is no need to upgrade or downgrade a
    > string in place, so having two representations doesn't hurt that much.
    > Since the majority of strings is ASCII, using UTF-32 for everything
    > would be wasteful.
    > Mutable and resizable character arrays use UTF-32 only.

    This archive was generated by hypermail 2.1.5 : Thu Dec 02 2004 - 17:16:56 CST