Re: Compression through normalization

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Nov 26 2003 - 11:08:29 EST

  • Next message: Peter Constable: "RE: numeric properties of Nl characters in the UCD"

    On 26/11/2003 07:05, D. Starner wrote:

    > ...
    >
    >The whole point of such a tool would be to send binary data on a transport that
    >only allowed Unicode text. In practice, you'd also have to remap C0 and C1
    >characters; but even then 0x00-0x1F -> U+0250-026F and 0x80-0x9F to U+0270-U+028F
    >wouldn't be too complex. Unless you've added a Unicode library to what could
    >otherwise be coded in 4k, normalization would add a lot of complexity.
    >
    >
    >
    You could encode your 256 bytes as the Unicode PUA code points, cf. how
    Microsoft encodes symbol fonts. You wouldn't have to worry about
    normalisation or canonical equivalence as there are no canonical
    equivalents to PUA characters.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 12:02:37 EST