Re: Compression through normalization

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Dec 05 2003 - 13:03:18 EST

  • Next message: Mark Davis: "Re: Compression through normalization"

    > OK. So it's Mark, not me, who is unilaterally extending C10.

    Where on earth do you get that? I did say that, in practice, NFC should be
    produced, but that is simply a practical guideline, independent of C10.

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Peter Kirk" <peterkirk@qaya.org>
    To: "Doug Ewell" <dewell@adelphia.net>
    Cc: "Unicode Mailing List" <unicode@unicode.org>
    Sent: Fri, 2003 Dec 05 02:51
    Subject: Re: Compression through normalization

    > On 05/12/2003 00:34, Doug Ewell wrote:
    >
    > >Peter Kirk <peterkirk at qaya dot org> wrote:
    > >
    > >
    > >
    > >>Surely ignoring Composition Exclusions is not unilaterally extending
    > >>C10. The excluded precomposed characters are still canonically
    > >>equivalent to the decomposed (and normalised) forms. And so composing
    > >>a text with them, for compression or any other purpose, still conforms
    > >>to C10, which explicitly allows "replacement of character sequences by
    > >>their canonical-equivalent sequences" - not only when the resulting
    > >>sequence is NFC or NFD.
    > >>
    > >>
    > >
    > >Ignoring the composition exclusions does still respect canonical
    > >equivalence, but does not preserve a canonical normalization form (using
    > >the language of UAX #15). So although it is not a violation of C10, it
    > >does seem to run afoul of Mark's recommendation:
    > >
    > >"In practice, if a compressor does not produce codepoint-identical text,
    > >it should produce NFC
    > >(not just any canonically equivalent text), and should document that it
    > >does so."
    > >
    > >
    > >
    > >
    > OK. So it's Mark, not me, who is unilaterally extending C10. Well, Ken
    > said much the same, so it's bilateral; and I agree it is a sensible
    > extension.
    >
    > But, as Ken also pointed out, it is quite permissible to use any
    > encoding for the intermediate e.g. compressed form of the text, as long
    > as it is possible to recover from this the normalised form of the
    > original text. My suggestion of composing the text using composition
    > exclusions meets this test, in a way not met by some of the other
    > suggestions, e.g. composing Korean characters into precomposed forms
    > which are (sadly) not canonically equivalent.
    >
    > --
    > Peter Kirk
    > peter@qaya.org (personal)
    > peterkirk@qaya.org (work)
    > http://www.qaya.org/
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Dec 05 2003 - 14:05:41 EST