RE: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)

From: Philippe Verdy (
Date: Mon Mar 26 2007 - 06:37:37 CST

  • Next message: Doug Ewell: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"

    Andrew West wroite:
    > On 26/03/07, <>
    > wrote:
    > >
    > > As both fixes are not realistic, I wish if UTS #37 is updated
    > > to have additional note to prohibit (not deprecate) the
    > > codepoint conversion from CJK Compatibility Ideographs to
    > > CJK Unified Ideographs with IVS. How do you think?
    > >
    > I think that Unicode cannot prohibit anyone from applying any
    > particular data transformation they like, including codepoint
    > conversion from CJK Compatibility Ideographs to CJK Unified Ideographs
    > with IVS.
    > If, for example, I were to write a text editor that allowed the user
    > to perform various transformations to a text (e.g. casing, diacritic
    > folding, normalization, transliteration conversions, etc.), there is
    > nothing that Unicode could say or do to stop me from also adding in a
    > facility to convert between CJK Compatibility Ideographs and their
    > corresponding CJK Unified Ideograph plus IVS if I so desired. As long
    > as my application does not purport not to modify the text, I believe
    > that I would remain conformant if I apply pretty much any data
    > transformation I like.

    False; if you do that, you modify the text; a Unicode-conforming transform
    DOES modify the text; the only algorithms that don't transform the text are
    not called "transforms", but "forms" (e.g. normalization forms and UTF's),
    encodings (e.g. CCS, CES, TES... Most of them however do require a transform
    before mapping from one to the other or to a UTF).

    What you describe is not different from other "transforms" like:
    * removing ignorable characters
    * case foldings
    * ...

    Unicode-conformance for algorithms requires the canonical equivalence (or
    equality) of the results between two implementations or instances of the
    algorithms, given any two canically equivalent inputs.

    What you are doing is not implying automatically such preservation of
    canonical equivalence of the output; it may be true if your implementation
    respects the contract, but your description could correspond to the
    following description, which is NOT a conforming process:
    * change a CJK Compatibility Ideograph to its corresponding CJK Unified
    Ideograph plus IVS;
    * change a CJK Unified Ideograph plus IVS to the next higher CJK Unified
    Ideograph plus IVS, if there's one, or to the CJK Compatibility Ideograph;
    * keep the other characters unchanged;
    Apparently it seems conforming, but consider the case where there are
    diacritics within or just after the sub-sequences being modified; and
    consider how default grapheme clusters are delimited.

    This archive was generated by hypermail 2.1.5 : Mon Mar 26 2007 - 06:40:44 CST