Re: UTF-24

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Apr 03 2003 - 15:01:50 EST

Next message: David Starner: "Re: UTF-24"

Previous message: Pim Blokland: "Re: Exciting new software release!"
In reply to: Pim Blokland: "UTF-24"
Next in thread: David Starner: "Re: UTF-24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Pim Blokland wrote:
> Why is there no UTF-24?

Well, I once proposed UTF-20...

> See, these MathText characters take up a lot of space. No matter how
> you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes
> long.

True for them alone, in those UTFs. Short of defining another Unicode encoding, there are two
answers that I can offer you:

1. Such characters are expected to be the minority of text, I suppose even in Math text, because
there are lots of other characters in such documents - punctuation, spaces, digits, regular text -
that are mostly on the BMP and thus shorter. So total Math documents with some MathText
supplementary characters will use, on average, fewer than 3B/code point in UTF-8/16.

2. If you want compression, use the existing SCSU (UTR #6) and BOCU-1 (UTN #6), or general-purpose
compressions like bzip2.

Note that this is only for text interchange - the majority of Unicode-aware software programs uses
UTF-16 internally.

Best regards,
markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: David Starner: "Re: UTF-24"
Previous message: Pim Blokland: "Re: Exciting new software release!"
In reply to: Pim Blokland: "UTF-24"
Next in thread: David Starner: "Re: UTF-24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 03 2003 - 15:35:05 EST