Re: Regulating PUA.

From: Philippe Verdy (
Date: Thu Jan 25 2007 - 20:10:05 CST

  • Next message: William J Poser: "politics of writing Taiwanese"

    From: "Richard Wordingham" <>
    > Ruszlan Gaszanov wrote on Wednesday, January 24, 2007 4:19 PM
    >>> I never asked for blocking PUAs on the web. Just that using them on the
    >>> web should
    >>> require an explicit protocol, and that the absence of such protocol is a
    >>> severe
    >>> issue, which also exposes users to security risks (notably if they need
    >>> to install
    >>>some site-specific software to get acces to the content), as well as bad
    >>> understanding and incorrect interpretations, if the only protocol
    >>> consists in
    >>> describing the protocol using a non-understandable humane language.
    >> 'To read this page properly, you need SuchNSuch font' is basically
    >> translated to
    >> HTML as <font face="SuchNSuch">...</font> ;)
    > and the presence of PUA characters tells you that the font is a key part of
    > the meaning - think of the font as specifying which extension of the non-PUA
    > part of Unicode is being used.

    Hmmm... Are all textual data on the web (or even just in a HTML page) really associated to <font> elements or font style?

    We have been said by W3 that style was designed to be clearly separated from the textual content.

    So I would rather use someting like this in HTML/XML:

    <document xmlns:my="htpp://">
      <containerElement class="my:PUA">text with my PUAs...</containerElement>

    Here the containerElement does not care, it is any one including the document's root element. The class contains the reference to my own PUA convention, and it can be styled into specific fonts if needed within CSS. But the indicated class is explicitly private, because its value is part of my own XML namespace (declared in the document's root element), using my own URL on the web or other unique URN (or UUID) specifying the convention, as in:

    <document xmlns:my="uuid:34b26059-4eca-407c-88a3-b21274312f7a">
      <containerElement class="my:PUA">text with my PUAs...</containerElement>

    But then, it should work also inother invisible parts. Forexample, in Javascript: how can a string keep internally the reference to the convention associated to the PUA characters it contains? Is this association mutable for each PUA character (it should not,because it is part of the character identity!)? How can we compose strings containing PUAs from different sources?

    The concept behind all this is that PUAs are really objects with more properties than just those fixed character properties defined in Unicode and fully represented by the figured codepoint:
    * each character instance also contains a variable property referencing the private convention.
    * but this variable character property can't be stored in plain-text, or in any UTF for now...

    ...Unless we define some special tags for storing the PUA convention, for example in the special plane 14, like the deprecated language tags, except that here they would be used to store the convention used in the PUAs following the tag, so that when parsing the text, from start to end, we can feed an internal context, whose current value gets copy-saved into each PUA character object found during parsing.

    When creating documents from an internal source, the encoder then has the choice of using those tags, or converting them into some more appropraite syntax specific to the document type. If the doucment is plain text only, then there's no other place to keep that information other than with special tags; it's up to the implementation to decide if inclusion of tags is valid for the intended target recipient, or if the output must raise an exception, but emitting PUA codepoints blindly should not be the single default option as it is now!

    This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 20:11:14 CST