RE: Compression through normalization

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Tue Nov 25 2003 - 05:32:26 EST

Next message: Arcane Jill: "RE: numeric properties of Nl characters in the UCD"

Previous message: Philippe Verdy: "RE: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Maybe in reply to: Philippe Verdy: "RE: Compression through normalization"
Next in thread: Doug Ewell: "Re: Compression through normalization"
Reply: Doug Ewell: "Re: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I'm pretty sure it depends on whether you regard a text document as a
sequence of characters, or as a sequence of glyphs. (Er - I mean
"default grapheme clusters" of course). Regarded as a sequence of
characters, normalisation changes that sequence. But regarded as a
sequence of glyphs, normalisation leaves the sequence unchanged. So a
compression algorithm could legitimately claim to be "lossless" if it
did normalisation but operated at the glyph level.

I'm pretty sure you DON'T need to preserve the byte-stream bit for bit.
For example, at the byte level, I see no reason to preserve invalid
encoding sequences, and at the codepoint level I see no reason to
preserve non-character codepoints. So - at the glyph level - we only
need to preserve glyphs, no? It all depends on how the compression
algorithm describes itself.

I think this might go wrong for "tailored grapheme clusters", but I
don't know much about them.

Jill

Next message: Arcane Jill: "RE: numeric properties of Nl characters in the UCD"
Previous message: Philippe Verdy: "RE: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Maybe in reply to: Philippe Verdy: "RE: Compression through normalization"
Next in thread: Doug Ewell: "Re: Compression through normalization"
Reply: Doug Ewell: "Re: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 06:13:54 EST