    On 3/28/06, Dean Harding <> wrote:
    > It's also (unfortunately) quite popular with a lot of email servers. I don't
    > really know why, because UTF-8 + quoted-printable would have been just
    > almost as efficient, and you wouldn't need some custom encoder/decoder
    > that's almost-but-not-quite Base64 encoding...

    Let's count:
    For example, a common Chinese character (from the BMP Unihan block)
    takes the following number of bytes:
    UTF-16: 2
    UTF-16+base64: 2.67
    UTF-7: 2.67 (plus a little overhead, less for longer runs of non-ASCII chars)
    UTF-8: 3
    UTF-8+base64: 4
    UTF-8+quoted-printable: 9

    For Latin (non-ASCII), Greek, Cyrillic, Arabic, Hebrew the numbers are
    UTF-16: 2
    UTF-16+base64: 2.67
    UTF-7: 2.67 (plus a little overhead...)
    UTF-8: 2
    UTF-8+base64: 2.67
    UTF-8+quoted-printable: 6

    In other words, for email, if you don't want to trust that the whole
    network is 8BIT-safe, UTF-7 is reasonably efficient.


