Re: Factor implements 24-bit string type for Unicode support

From: Hans Aberg (haberg@math.su.se)
Date: Mon Feb 04 2008 - 12:22:12 CST

  • Next message: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"

    On 4 Feb 2008, at 18:47, Markus Scherer wrote:

    > Most Unicode software and libraries use UTF-16 internally, which is
    > easy to use.

    It may then have a legacy from the days one thought two bytes would
    be enough. - It is common in computers to keep outdated form just for
    backwards compatibility, even long time they have fallen out of use.

    > Some use UTF-8 even internally, if they see a large majority of
    > high-volume text in ASCII.

    Sure, for programs that essentially processes bytes. I made a regular
    expression process, so that lexers like Flex need not be rewritten -
    they essentially just process byte patterns, anyway.

    > UTF-32 as a string encoding is rare. (Some people call single-code
    > point integers "in UTF-32".)

    This would be for libraries that cannot handle variable size
    charters. C++ maybe(?).

       Hans Åberg



    This archive was generated by hypermail 2.1.5 : Mon Feb 04 2008 - 12:26:00 CST