From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Dec 12 2003 - 07:56:43 EST
On 12/12/2003 04:29, Philippe Verdy wrote:
> ...
>
>But what you suggest here is exactly what a standard file compressor does.
>
>It does not solve any problem in the representation of characters, the
>compression scheme remains private, and can only be interpreted as text by
>redecomposing these PUAs (in their scope) to the appropriate complex DGCs.
>In addition, you need to find a way to store these associations between PUAs
>and DGCs, so the complexity is even worse.
>
>You would probably use it only if there are multiple occurences of these
>complex DGCs, just to save some space (this is what is performed in the
>Hangul Johab syllables as they occur very frequently when writing modern
>Korean, and the space benefit comes from the fact that it does not need to
>encode the associations between syllables and DGCs of jamos, as this is
>defined by their canonical equivalences and implemented with a very basic
>algorithm).
>
>So unless you can create such simple algorithm to map complex DGC with PUA
>ranges, there's little use of what you propose here.
>
>
This is not intended as a file compression technique. (Indeed it would
be an extremely poor one as it is based on UTF-32!) It is intended only
to solve the problem Mark mentioned that indexing etc of strings is
inefficient when the string is counted and divided according to grapheme
clusters - according to the recommendations for editing in UAX #29. The
mechanism I proposed was intended to allow a string of grapheme clusters
to be indexed efficiently, and nothing else - although as you point out
it might also help with rendering (although not neccessarily, as the
same grapheme cluster is not always rendered the same e.g. in Arabic).
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 08:36:43 EST