Re: minimizing size (was Re: allocation of Georgian letters)

From: Doug Ewell (dewell@roadrunner.com)
Date: Thu Feb 07 2008 - 10:26:01 CST

  • Next message: John H. Jenkins: "Re: minimizing size (was Re: allocation of Georgian letters)"

    Every once in a while I bring up some of the issues raised in Unicode
    Technical Note #14, "A Survey of Unicode Compression," and someone
    replies that there really isn't much interest in text compression any
    more, now that memory is cheap and disk is cheap and everyone in the
    world has a greased-lightning Internet connection at their disposal. It
    looks like there might be some lingering interest after all.

    Those attempting to defend Unicode against the duplicate encoding
    proposed by the Tamils might note that existing Unicode Tamil text can
    be reduced to 1 byte per (Unicode) character using SCSU, which is not
    true for TACE-16 text, spread as it is across multiple 128-byte
    half-blocks. I don't have any TACE-16 text at hand, but it wouldn't
    surprise me if Unicode Tamil in SCSU were actually smaller than TACE-16
    in any encoding scheme. And remember, *decoding* SCSU is easy.

    For them what cares:
    http://www.unicode.org/notes/tn14/UnicodeCompression.pdf

    --
    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Thu Feb 07 2008 - 10:29:25 CST