Re: Definitions

From: Philippe Verdy (
Date: Wed Nov 19 2003 - 15:27:50 EST

  • Next message: Philippe Verdy: "Re: Definitions"

    From: "Peter Constable" <>
    > A software product could assign every single PUA codepoint to mean some
    > kind of formatting instruction, and insert these into the text like
    > markup. In that case, a user's PUA characters will be re-interpreted by
    > that software as formatting instructions. Is that product conformant?
    > Yes. Is it useful? Not for that user.

    With a very simple transcoder, you could remap all HTML markup and
    supplementary end of lines used in markup into 256 PUAs. You would get a
    file that contains ALL the HTML markup but still complies to the Unicode
    plain-text definition. Rendering it back to HTML would use a reverse filter,
    and would create a HTML file without any PUA, so it would be rendered

    The only problem is that PUAs have no defined rendering, and Unicode does
    not specify ranges of PUAs for distinct uses, with distinct but predefined
    _default_ character properties:
    why isn't there a range for Mn diacritics, a range for ideographic letters
    or symbols, and a range for ignorable formatting controls (all of them with
    combining class 0). At least it would have allowed applications and renderer
    to behave correctly even in the absence of support for those PUAs, by using
    a correct _default_ rendering, instead of just displaying narrow white
    boxes, or nothing...

    I don't know why this would break anything: documents can still use PUAs the
    way they want with their own semantic and behavior. But suggesting distinct
    ranges for the default behavior would be a real bonus to help applications
    adopt a coherent behavior face to unknown or unspecified PUAs.

    This archive was generated by hypermail 2.1.5 : Wed Nov 19 2003 - 16:21:39 EST