Re: That UTF-8 Rant

From: Gary Roberts (gar@sparc.sandiegoca.ncr.com)
Date: Thu Jul 22 1999 - 20:40:19 EDT


On Thu, 22 Jul 1999, Markus Kuhn wrote:

> Actually, I happen to be extremely interested in exactly these
> questions, because I happen to be someone who makes implementation
> decisions about databases that could one day grow into the
> hundreds-of-gigabyte range. I have not yet seen multi-terabyte plain
> text databases though (perhaps the email/fax eavesdroppers at the NSA
> have these, if anyone ;-), these tend more to be filled with images and
> not text.

We have many customers with multi-terabyte databases. Our Japanese
customers in particular have claimed a high percentage of character data
(The rest is almost entirely numeric). Our Unicode (UTF-16)
implementation is criticized as being inefficient in storage relative to
Shift-JIS (which we also support). I suspect a UTF-8 implementation would
be unpopular.
                                        *



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT