Re: Internal Representation of Unicode

From: Rick McGowan (
Date: Fri Sep 26 2003 - 12:05:10 EDT

  • Next message: Peter Kirk: "Re: Fun with proof by analogy, was Re: Mojibake on my Web pages"

    myrkraverk.......sourceforge.... wrote:

    > In a plain text environment, there is often a need to encode more than
    > just the plain character.
    > Since I'm using 64 bits, I call it Excessive Memory Usage Encoding, or
    > EMUE.
    > I thought of dividing the 64 bit code space into 32 variably wide
    > plains, one for control characters, one for latin characters, one for
    > han characters, and so on;

    This all seems to me like something of a pointless excercise. Or maybe
    you're not making clear what is your intented audience of users and
    problems that you're trying to solve.

    Decent libraries exist that already do nice things with strings having
    attributes. And that, in my opinion, is a better model than bit-hacking in
    a 64-bit space with vague implementation-defined attributes that change
    depending on the "script" of a character. Such "attributed strings" are
    easy to work with and provide a much higher-level model than this.

    You might want to check out Apple's Cocoa environment, particularly the
    definitions of the attributed string classes. For example...
    or even the intro:

    I'm sure there are libraries with similar capabilities for storing
    characters + attributes in Java and other languages, I'm just not familiar
    with them. Maybe some of the developers can chime in with their favorite
    attributed string libraries. Even if you don't use one, you might find the
    attributed string model educational.

    (All of the above of course reflects only my personal opinion.)


    This archive was generated by hypermail 2.1.5 : Fri Sep 26 2003 - 12:46:55 EDT