Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Dec 12 2003 - 07:56:43 EST

  • Next message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    On 12/12/2003 04:29, Philippe Verdy wrote:

    > ...
    >
    >But what you suggest here is exactly what a standard file compressor does.
    >
    >It does not solve any problem in the representation of characters, the
    >compression scheme remains private, and can only be interpreted as text by
    >redecomposing these PUAs (in their scope) to the appropriate complex DGCs.
    >In addition, you need to find a way to store these associations between PUAs
    >and DGCs, so the complexity is even worse.
    >
    >You would probably use it only if there are multiple occurences of these
    >complex DGCs, just to save some space (this is what is performed in the
    >Hangul Johab syllables as they occur very frequently when writing modern
    >Korean, and the space benefit comes from the fact that it does not need to
    >encode the associations between syllables and DGCs of jamos, as this is
    >defined by their canonical equivalences and implemented with a very basic
    >algorithm).
    >
    >So unless you can create such simple algorithm to map complex DGC with PUA
    >ranges, there's little use of what you propose here.
    >
    >
    This is not intended as a file compression technique. (Indeed it would
    be an extremely poor one as it is based on UTF-32!) It is intended only
    to solve the problem Mark mentioned that indexing etc of strings is
    inefficient when the string is counted and divided according to grapheme
    clusters - according to the recommendations for editing in UAX #29. The
    mechanism I proposed was intended to allow a string of grapheme clusters
    to be indexed efficiently, and nothing else - although as you point out
    it might also help with rendering (although not neccessarily, as the
    same grapheme cluster is not always rendered the same e.g. in Arabic).

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 08:36:43 EST