Re: [long] Use of Unicode in AbiWord

From: Alain LaBont\i\ (
Date: Fri Mar 19 1999 - 15:21:15 EST

A 11:54 99-03-19 -0800, A. Vine a écrit :
> wrote:
>> > We also think that we should switch our representation to UTF-8. On
>> > every platform we current plan to support, this would eliminate the
>> > encoding conversion step (as well as a lot of memory usage) for any
>> > run of text which includes only ASCII characters. For obvious
>> > reasons, and with no offense intended to the majority of the world who
>> > primarily use double-byte encoded characters, we believe this to be a
>> > common case worth optimizing.
>> this optimizes for one language while making it a lot less efficient to use
>> just about every other language including all european ones, as well as the
>> line and paragraph separator, the euro, the formatting characters,
>> dingbats, etc.
>Not true. Not only does ASCII represent significantly more languages than
>English (Hawaiian, Indonesian, Latin, some Native American languages,
etc.), but
>many Latin script languages are mostly ASCII (e.g. German, French, Spanish,
>Italian, Portuguese, Dutch, Danish, Swedish, Norwegian, etc.).

Try to filter non-ASCII from French and messages are unreadable at best,
even if about 97% of characters are indeed ASCII (statistics on some
corpuses I have)... but the 3% remaining is highly relevant, do not forget
to mention, and essential.

Let's have the sense of humour.

(; (:

Alain LaBonté

