Re: Do 16 bit surrogate high bits indicating characters have a persisting mea...

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Thu Feb 22 2001 - 11:45:36 EST

Next message: Figge, Donald: "RE: What about musical notation?"
Previous message: Michael Everson: "Re: What about musical notation?"
Maybe in reply to: DougEwell2@cs.com: "Re: Do 16 bit surrogate high bits indicating characters have a persisting mea..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Yes. As Marco Cimarosti has indicated, each supplementary character is
> represented in UTF-16 by a surrogate *pair*. Both surrogates need to be
> specified each time. Consequently, a stream of Deseret text (for example)
> will contain a lot of U+D801's.

It is also worth noting that any semi-intelligent compression algorithm
(even plain old ZIP on the text file!) should do very well on text that is
primarily Deseret or any other scheme, since there is that consistent
repetition (the same thing happens with BMP characters in the same range due
to one byte being repeated).

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Next message: Figge, Donald: "RE: What about musical notation?"
Previous message: Michael Everson: "Re: What about musical notation?"
Maybe in reply to: DougEwell2@cs.com: "Re: Do 16 bit surrogate high bits indicating characters have a persisting mea..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT