Re: Compression and Unicode [was: Name Compression]

From: Torsten Mohrin (mohrin@sharmahd.com)
Date: Sat May 13 2000 - 11:57:38 EDT


Marco.Cimarosti@icl.com wrote:

>I noticed that Torsten's scheme assumes that " " and "-" are mutually
>exclusive separators, but this is not true for a handful of Tibetan
>characters that have sequences like " -" or "- " (see list in l_xx.txt). How
>are these cases handled?

I've used split(/[- ]/, ...) in Perl. This results in an empty string
word between ' ' and '-', which is encoded like any other word. It's
not optimal, but I didn't change it yet.

--Torsten



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT