Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Dec 03 2003 - 01:10:45 EST

  • Next message: Sue and Maurice Bauhahn: "RE: MS Windows and Unicode 4.0 ?"

    Frank Yung-Fong Tang <ytang0648 at aol dot com> wrote:

    >> UTF-16 6,634,430 bytes
    >> UTF-8 7,637,601 bytes
    >> SCSU 6,414,319 bytes
    >> BOCU-1 5,897,258 bytes
    >> Legacy encoding (*) 5,477,432 bytes
    >> (*) KS C 5601, KS X 1001, or EUC-KR)
    >
    > What is the size of gzip these? Just wonder
    > gzip of UTF-16
    > gzip of UTF-8
    > gzip of SCSU
    > gzip of BOCU-1
    > gzip of Legacy encoding

    I don't have gzip, but I can give you the PKZip sizes, which should be
    quite similar:

    UTF-16 2,685,232 bytes
    UTF-8 2,774,356 bytes
    SCSU 2,756,470 bytes
    BOCU-1 2,772,418 bytes
    EUC-KR 2,518,201 bytes

    Note that the largest of these is only 10.2% larger than the smallest.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Wed Dec 03 2003 - 01:53:54 EST