Re: PRI#86 Update

From: SADAHIRO Tomoyuki (
Date: Thu May 11 2006 - 09:44:41 CDT

  • Next message: Tom Gewecke: "Win IE 7b2 and UTF-8"

    Mark Davis wrote
    > The result of forbidding certain characters would mean that it would be
    > only supporting a subset of Unicode characters. This is already done by
    > some protocols; for example, IDN explicitly forbids certain characters
    > on input. That should be clarified in the text. Such an implementation
    > cannot claim to be a conformant implementation of normalization for all
    > Unicode characters.
    > You're right to bring this up; it would need to be clarified in the text.
    > Mark

    I see. Thank you.

    In my opinion, anyone who will design a subset normalization should
    take care that it can be easily processed taking advantage of a "full"
    (i.e. UAX#15-conforming) normalizer (as long as the subset normalization
    is intended to be relevant to the normalization of UAX#15).

    For that purpose, a subset normalization should coincide with the full
    normalization within the subset and reject any input outside the subset
    to make sure that no output is inconsistent with the full normalization.

    P.S. The normalization for legacy encodings (Annex 6 in UAX#15) may be
    a sort of subset normalization, as the repertoire of a legacy encoding
    is mapped onto a subset of Unicode.

    SADAHIRO Tomoyuki

    This archive was generated by hypermail 2.1.5 : Thu May 11 2006 - 09:45:26 CDT