RE: [unicode] Re: UTF-c

From: Doug Ewell (doug@ewellic.org)
Date: Tue Feb 22 2011 - 14:06:11 CST

  • Next message: Asmus Freytag: "Re: [unicode] Re: UTF-c"

    <mpsuzuki at hiroshima dash u dot ac dot jp> wrote:

    > The resynchronization on newline (or on ASCII punctuation)
    > is needed, but I think today it is becoming insufficient
    > gradually.

    Again, it depends on the intended purpose of this (or any other)
    encoding scheme. Resynchronization adds redundancy, which costs bytes.
    If the goal is to minimize bytes, the encoding scheme has to strip away
    as much redundancy as possible.

    Most people now suggest general-purpose compression as the "best" way to
    compress Unicode text. Drop one byte out of a deflated or bzipped file,
    and the resulting damage to the text will be arbitrary.

    Note that UTF-8, which has plenty of redundancy, was never represented
    to be the smallest possible way to encode characters; it was only
    represented not to be extravagant.

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 14:09:39 CST