Re: Do 16 bit surrogate high bits indicating characters have a persisting mea...

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Thu Feb 22 2001 - 11:45:36 EST


From: <DougEwell2@cs.com>

> Yes. As Marco Cimarosti has indicated, each supplementary character is
> represented in UTF-16 by a surrogate *pair*. Both surrogates need to be
> specified each time. Consequently, a stream of Deseret text (for example)
> will contain a lot of U+D801's.

It is also worth noting that any semi-intelligent compression algorithm
(even plain old ZIP on the text file!) should do very well on text that is
primarily Deseret or any other scheme, since there is that consistent
repetition (the same thing happens with BMP characters in the same range due
to one byte being repeated).

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT