Re: Regulating PUA.

From: Philippe Verdy (
Date: Sat Jan 27 2007 - 14:32:54 CST

  • Next message: Philippe Verdy: "Re: Proposing a DOUBLE HYPHEN punctuation mark"

    From: "Richard Wordingham" <>
    > It would indeed be pleasant if there were some way of defining the meaning
    > and properties of PUA characters. Unfortunately, there doesn't seem to be.
    > In this case, font seems to be the most practical way of identifying the
    > meaning of a PUA character. After all, a recipient may in principle freely
    > switch between conventions, depending on whom he is communicating with.

    That's why I described such a mechanism allowing to tag every occurence of a PUA (at the character level), and making PUAs into objects requiring two properties: the codepoint (which has some intrinsic default properties from the Unicode standard itself), and a transparent namespace (the best solution being using a URI string, like in XML namespaces). Unicode String objects (that contain ordered vectors of codepoints) would need to be extended by allowing to store also separately ranges of indices to which a namespace applies.
    All non PUAs, would belong to the default namespace (empty URI) for standard characters, and so each character,even if it's not a PUA, isattached to a single namespace.

    Transporting this inormation can then be performed at the document level (for example in external meta-data, like HTTP headers, or filesystem's file properties), or embedded in an envelope format that allows storing multiple parallel data threads (XML...), or could in fine be stored in plain-texts using special tag characters (like language tags in plane 14).

    Such system does not require using a single convention for compound documents.

    And it's certainly more open than a font, even though a font may also be the carrier of the private convention. But fonts are very poor carriers, due to their versioning, their lack of interoperability, and the attached licences that prohibit anyone except font designers to build such conventions (in files that are extremely complex and costly to develop) and to distribute textual data that would need such convention (due to the distribution restrictions on actual font designs and implementations. One way to solve this problem could be that fonts embed the PUA convention namespace (URI), and then documents do not specify a required font, but the namespace URI.

    Then fonts can be designed, if needed, to support multiple PUA conventions (including the own conventions of font designers or vendors), by storing multiple URI's...

    This archive was generated by hypermail 2.1.5 : Sat Jan 27 2007 - 14:35:28 CST