RE: minimizing size (was Re: allocation of Georgian letters)

From: Kent Karlsson (
Date: Thu Feb 07 2008 - 06:04:57 CST

    Michael S. Kaplan wrote:
    > Having flown halfway around the world to talk to people who
    > for whatever
    > reasons, both valid and invalid (and not really
    > distinguishing which is
    > which on their list of concerns), are unhappy with a language
    > encoding that
    > in their view doubles or worse the amount of bytes used to
    > store their
    > language in Unicode, I can tell you that this as very real
    > concern on some people's minds.

    I guess that referred in particular to Tamil.

    Just out of curiosity: has anyone made any actual storage
    requirements measurements on actual typical texts encoded
    according to Unicode (UTF-8/UTF-16) versus according to
    their proposal? Both pure "plain text" (Tamil only) and,
    say, moderately embedded in HTML markup.

    If so, what were the results?

            /kent k

