Re: Factor implements 24-bit string type for Unicode support

From: Asmus Freytag ([email protected])
Date: Tue Feb 05 2008 - 12:56:49 CST

Next message: Andreas Stötzner: "Monetary signs"

Previous message: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"
In reply to: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"
Next in thread: Doug Ewell: "Re: Factor implements 24-bit string type for Unicode support"
Reply: Doug Ewell: "Re: Factor implements 24-bit string type for Unicode support"
Reply: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Phillipe gave some very interesting arguments (complete with specific
figures) but without citing his evidence or stating the assumptions. A
thorough comparison of the performance of large data volumes in the
various encoding forms would be interesting.

Assuming for the moment, that the general arguments that Phillipe
presented are not that far off the mark, it would seem that UTF-16 is
not such a bad choice either. Because all, except very specialized, data
collections can expect to have 99+% of their character codes in the BMP,
the cost of decompressing the data to UTF-32 is dominated by the case
for BMP characters. Even if handing surrogates were to take 100 times as
long, that would only double the average.

In the meantime, the benefits of more localized memory access are those
of a 50% reduction, not a 25% reduction. Plus, in many cases, you get
the benefit of direct library support w/o the need to convert the
strings, if you want.

That's the real argument I see against a 3-byte form.

But, knowing programmers, they won't rest until every single permutation
of possible encoding forms has been used and foisted on some
unsuspecting user.

A./

Next message: Andreas Stötzner: "Monetary signs"
Previous message: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"
In reply to: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"
Next in thread: Doug Ewell: "Re: Factor implements 24-bit string type for Unicode support"
Reply: Doug Ewell: "Re: Factor implements 24-bit string type for Unicode support"
Reply: Philippe Verdy: "RE: Factor implements 24-bit string type for Unicode support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Feb 05 2008 - 12:59:44 CST