Re: Unicode forms for internal storage

From: Elliotte Rusty Harold (
Date: Tue Jan 20 2004 - 14:45:12 EST

    At 9:52 AM -0800 1/20/04, Markus Scherer wrote:
    >You need not invent something new: Just use a simplified SCSU
    >encoder, and either a regular SCSU decoder or one that only supports
    >the features which your custom encoder uses.

    Thanks. It looks like exactly what I need.

    >For a tiny SCSU encoder (main function 75 lines of commented C) that
    >also compresses a little better than what you describe see
    >You could scale that encoder up or down to your liking.
    >For a full SCSU converter you could use ICU, for example.

    Hmm, I'm already carrying around part of ICU4J to perform
    normalization. I'll have to check and see if I've got the SCSU
    support compiled into my version of the ICU jar.

    >You could also use BOCU-1.

    Reading the BOCU tech note, it looks like SCSU performs better, The
    main benefit of BOCU is if you're transmitting this encoding on the
    wire, which I am definitely not doing. But SCSU looks like a really
    nice option. Thanks.

