Re: What is the principle?

From: Philippe Verdy (
Date: Wed Mar 31 2004 - 04:36:48 EST

  • Next message: Peter Kirk: "Re: What is the principle?"

    From: "Michael Everson" <>
    > At 17:02 -0800 2004-03-30, Mike Ayers wrote:
    > >I feel obligated to take this one step further - these folks are
    > >forgetting that "P" stands for "private". Their use of this space
    > >is their own problem, in all senses. It does not seem reasonable to
    > >me that *any* standard behavior could be expected of PUA code
    > >points, from operating systems or applications, as such may have
    > >chosen to, or may yet choose to, use those code points to
    > >encapsulate very un-font-rendering-like behavior, and such a
    > >decision, made past, present or future, is a perfectly valid private
    > >use.
    > Which I assume means: "it's wrong for Unicode to make ANY property
    > pronouncements for ANY PUA characters, since that defines them, and
    > removes the P from the Use."

    Do you mean here that any properties currently defined in Unicode for PUAs
    should be deprecated with their current normative value, and left to
    implementers, so that no application can be said non-conforming if it implements
    other defaults?
    May be this would require some adjustments in the normative wordings related to
    Unicode conformance...

    And as well, variant selectors, if they are used on PUAs should not be
    constrained as well (the current restrictions for variant selectors usage should
    not apply to PUAs as well, given that a VSn should still be fully ignorable
    including for PUAs that have no defined normative semantic in Unicode, meaning
    that the combination of PUA+VSn has also no defined normative semantic in
    Unicode itself).

    Leave that for implementations, and may be we'll ease the development of new
    scripts, by allowing other groups to work on some interchangeable formats based
    on PUAs, which could then be later integrated in Unicode after an easier phase
    where these scripts would have been experimented. It would ease the adoption of
    a later consensus, and would offer a great tool for developers and searchers,
    that could safely base their work based on Unicode encoding conventions

    Also this would be a good indicator that specialized 8-bit code sets are no
    longer necessary, and IANA could then close its 8-bit encodings registry, in
    favor of PUA-based encodings defined by some conventional rules which could then
    become a standard and open extension mechanism...

    This will have the advantage of avoiding pressures on Unicode to normalize new
    scripts too fast, and longer open experimentations would avoid many future
    errors in the new normalized scripts.

    The CSUR registry is one approach for the definition of new scripts, has
    its own, but for now I see little efforts to allow specifying these properties
    in a partially interchangeable format, and one reason can be that Unicode has
    made too many restrictions on the usage of PUAs, so that developers fear that
    their protocols which need them become non conforming.

    I do think that there must exist a way to have PUAs used safely without
    ambiguities or risks of collisions, using extensions mechanisms similar to
    namespaces in XML, and some normative declarations and possibly a registry of
    PUA sets (why not the IANA charsets registry if it can reference the associated
    properties with some URL to a script definition schema?).

    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 05:22:36 EST