From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 04 2005 - 13:38:24 CST
Jeroen Ruigrok van der Werven asked:
> Given these points, wouldn't an ever-expanding standard like Unicode be a
> cause to data bloat at one point? Since you will need continuous larger
> encoding space to encode certain specific characters?
The actual "data bloat" rate of the standard right now is slightly
more than 1000 characters encoded per year. On a base of more than
96,000 characters already encoded, that is just over 1% rate of gain
per annum.
I have done the calculations a number of times on this list to
demonstrate that that rate leaves the current standard good for
700+ years of additions without any architectural change.
And except for a couple of foreseeable "hiccups" for CJK characters
being sorted out now, the rate of additions will *decline*, rather
than rise in the future, because the pool of remaining good candidates
for encoding is dropping, and the types of outstanding candidates
(historic scripts, oddball symbol collections that edge off into
icons, logos, and pictures) are increasingly difficult to generate
good proposals for and to reach clear consensus on encoding.
>
> Not to mention that the supporting fonts will get bigger and bigger.
Actually not, for the most part. Most font support is segmented into
useful subsets (by script and other criteria). Some new characters
gradually get added to supporting fonts, but many anticipated
additions, such as Sumero-Akkadian cuneiform (due soon) or
Egyptian hieroglyphics (not even in ballot yet) will mostly be
supported by specialist fonts dedicated just to them.
> Although I have no idea how much of a problem it is given storage prices
> nowadays.
>
> Anyone clued about this?
Yep. And it isn't a serious issue, compared to the many other
issues we deal with for the standard.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 13:39:11 CST