Re: Compression and Unicode [was: Name Compression]

From: Torsten Mohrin (mohrin@sharmahd.com)
Date: Sat May 13 2000 - 11:57:38 EDT

Next message: Torsten Mohrin: "Re: Name Compression. Comparison and Tweaks"
Previous message: Christopher John Fynn: "Re: Compression and Unicode [was: Name Compression]"
Maybe in reply to: Juliusz Chroboczek: "Compression and Unicode [was: Name Compression]"
Next in thread: Asmus Freytag: "RE: Compression and Unicode [was: Name Compression]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>I noticed that Torsten's scheme assumes that " " and "-" are mutually
>exclusive separators, but this is not true for a handful of Tibetan
>characters that have sequences like " -" or "- " (see list in l_xx.txt). How
>are these cases handled?

I've used split(/[- ]/, ...) in Perl. This results in an empty string
word between ' ' and '-', which is encoded like any other word. It's
not optimal, but I didn't change it yet.

--Torsten

Next message: Torsten Mohrin: "Re: Name Compression. Comparison and Tweaks"
Previous message: Christopher John Fynn: "Re: Compression and Unicode [was: Name Compression]"
Maybe in reply to: Juliusz Chroboczek: "Compression and Unicode [was: Name Compression]"
Next in thread: Asmus Freytag: "RE: Compression and Unicode [was: Name Compression]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT