RE: Compression through normalization

From: D. Starner (shalesller@writeme.com)
Date: Wed Nov 26 2003 - 10:05:03 EST

Next message: John Cowan: "Re: Definitions"

Previous message: Michael Everson: "RE: numeric properties of Nl characters in the UCD"
Maybe in reply to: Philippe Verdy: "RE: Compression through normalization"
Next in thread: Peter Kirk: "Re: Compression through normalization"
Reply: Peter Kirk: "Re: Compression through normalization"
Reply: jon@hackcraft.net: "RE: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> I see no reason why you accept some limitations for this
> encapsulation, but not ALL the limitations.

Because I can convert the data from binary to Unicode text in UTF-16
in a few lines of code if I don't worry about normalization. Suddenly
the rules become much more complex if I have to worry about normalization.

The simple fact is I can change UTF-8 to UTF-16 to UTF-32 with several
utilities on my system, but not the normalization. I don't know of any
basic text tools that handle normalization, so if I edit a source code
and email it to someone (which compresses and decompresses automatically),
they're going to have trouble running diff on the code.

> If you don't want that such "denormalisation" occurs during the compression,
> don't claim that your 9-bit encapsulator produces Unicode text (so don't
> label it with a UTF-* encoding scheme or even a BOCU-* or SCSU character
> encoding scheme, but use your own charset label)!

The whole point of such a tool would be to send binary data on a transport that
only allowed Unicode text. In practice, you'd also have to remap C0 and C1
characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to U+0270-U+028F
wouldn't be too complex. Unless you've added a Unicode library to what could
otherwise be coded in 4k, normalization would add a lot of complexity.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

Next message: John Cowan: "Re: Definitions"
Previous message: Michael Everson: "RE: numeric properties of Nl characters in the UCD"
Maybe in reply to: Philippe Verdy: "RE: Compression through normalization"
Next in thread: Peter Kirk: "Re: Compression through normalization"
Reply: Peter Kirk: "Re: Compression through normalization"
Reply: jon@hackcraft.net: "RE: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 10:57:30 EST