Re: minimizing size (was Re: allocation of Georgian letters)

From: Douglas Davidson
Date: Thu Feb 07 2008 - 11:31:41 CST

    On Feb 7, 2008, at 3:22 AM, Michael S. Kaplan wrote:

    > Having flown halfway around the world to talk to people who for
    > whatever reasons, both valid and invalid (and not really
    > distinguishing which is which on their list of concerns), are
    > unhappy with a language encoding that in their view doubles or worse
    > the amount of bytes used to store their language in Unicode, I can
    > tell you that this as very real concern on some people's minds.
    > True or false, it is on their minds. They can all add and multiply,
    > and it certainly looks like a 2x or 3x situation to them.
    > And we get a lot further by acknowledging their concerns and then
    > showing them that they have less to be concerned about than they
    > think, in the end, then we ever would by telling them there are
    > wrong, wrong, wrong.

    One mitigating factor is that many document formats have at least an
    option to employ some form of compression. For example, both OOXML
    and ODF are zip-archived XML, which means that most text will usually
    end up being compressed. If one is concerned about sending HTML over
    the wire, then one can use HTTP compression. Obviously these are
    general-purpose compression algorithms, not text-specific ones, but
    they still should be able to help. Actually, in most XML and HTML
    documents, a large proportion of the characters are ASCII markup
    anyway, so the overall expansion is not going to be 2x or 3x in the
    first place. Furthermore, in many cases the size of the text in any
    form is less significant than the size of other data such as images.

    Douglas Davidson

