From: Markus Scherer (email@example.com)
Date: Tue Jan 20 2004 - 12:52:40 EST
You need not invent something new: Just use a simplified SCSU encoder, and either a regular SCSU
decoder or one that only supports the features which your custom encoder uses.
For a tiny SCSU encoder (main function 75 lines of commented C) that also compresses a little better
than what you describe see http://www.mindspring.com/~markus.scherer/unicode/tr6/
You could scale that encoder up or down to your liking.
For a full SCSU converter you could use ICU, for example. http://oss.software.ibm.com/icu/
You could also use BOCU-1.
With ICU you need not write anything new :-)
(If you need only parts of ICU, see http://oss.software.ibm.com/icu/userguide/packaging.html)
Elliotte Rusty Harold wrote:
> Last night it occurred to me it might be possible to design an internal
> storage format for this class which had better memory usage
> characteristics. In particular I'd like ASCII data to occupy only a
> single byte, and all other BMP characters from 128 to 65535 to occupy
> only two bytes. Non-BMP characters could be stored in surrogate pairs.
This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 14:36:57 EST