Re: Unicode forms for internal storage

From: Elliotte Rusty Harold (
Date: Wed Jan 21 2004 - 06:33:15 EST

  • Next message: Andrew C. West: "Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)"

    At 10:59 PM -0800 1/20/04, Doug Ewell wrote:

    >If you are using the "mini" version of SCSU where Latin-1 characters are
    >stored as 1 byte each and everything else is stored as UTF-16 (using SCU
    >and UC0 tags to switch between modes), you ought to achieve really good

    I'll have to try this. The speed hit on using full SCSU was very noticeable.

    Ultimately, I suspect what I should do is provide some means of
    letting the user choose the appropriate algorithm and trade-off
    between speed and space for their data, possibly via a system
    property, a flag in the class, or even a build-time option. However,
    that's going to have to wait for 1.1. For 1.0, I just want to pick
    something that's a reasonable compromise across the most common cases.

       Elliotte Rusty Harold
       Effective XML (Addison-Wesley, 2003)

    This archive was generated by hypermail 2.1.5 : Wed Jan 21 2004 - 09:44:26 EST