Re: What is the principle?

From: Philippe Verdy (
Date: Wed Mar 31 2004 - 16:39:00 EST

  • Next message: "Doing Markup in Plain Text: A Modest Proposal for Planes 4-B of Unicode"

    From: "Ernest Cline" <>
    > I'd have to take the time to list them, but a quick glance convinces
    > me that there are at most several hundred combinations that would
    > need to be supported if we limit things to just those combinations
    > already in use. (it might take more, if for example all 256 potential
    > combining classes were supported instead of the 26 listed in
    > UCD.html), At 128 characters per combination plus more for a
    > few that might need them, it should prove possible to handle this
    > in 1 or 2 planes.

    This seems highly excessive. We already have plenty of PUA space. All what we
    need is a standard way (file format? protocol?) to transport PUA character
    properties, and possibly encode a reference (URI?) to the definition file or
    service. If Unicode does not want to do this job, at least it could participate
    in such independant development by commenting about the protocol/format used to
    encode these properties (notably to make sure that the system remains extensible
    and can encode new properties that may be added later).

    This would work in relation with the evolution of the Unicode standard itself
    (versioning) which may be handled correctly (however less efficiently) through a
    sort of emulation layer that would "mimic" the behavior of new standardized
    characters and properties. I won't expect that every application will be able to
    interpret this protocol or implement the emulation layer, but at least it
    becomes possible to create less ambiguous interoperable solutions based on other
    existing standards (that's why I think that, if such separate development is
    created, it should be based on the most advanced interoperability technologies
    of today, notably XML and its schemas and namespaces).

    You think this is overkill? Well in some near future, I think that it will be
    difficult for applications to follow the evolutions of the Unicode standard, and
    differences of versions will cause soon a nightmare if there's no more formal
    way to specify what is implicitly part of a Unicode version (and does not need a
    complex negoctiation of protocol) clearly identified by a identifier resolvable
    by online services, and what can be supported the most completely as possible by
    an emulation layer. XML schemas, because they are versionnable, can really help
    here (notably because of the capability of modern XML parsers to use local
    caches for definition data, including local prebuilt-in implementations which
    are the most efficient).

    So I don't like the idea of adding more PUAs with other defaults. I much favor
    some more fredom on the use of PUAs, and a way to make what looks like a
    deviation of the standard today, a now conforming solution.

    It will become more important with the remaining scripts to encode, simply
    because we really lack some resources to be able to produce any standard for
    them. What this means is that the evolution of Unicode will soon become
    impossible without experimentation and gradual integration with some
    interoperable services. With the current standard stability policy, this need is
    even more important because further corrections of past errors will become
    nearly impossible (and so this will stop any attempt to make significant
    evolutions to the standard itself).

    It's clear that there are needs for PUAs today, just because Unicode is becoming
    an universal standard for more and more applications. If this universal standard
    blocks evolution, then others will want to develop indepant standards and there
    will be a risk of splits caused by OS vendors themselves.

    (see what has happened 15 years ago to Unix, and the high difficulty today to
    reunify what was initially a unique standard; thanks GNU and Linux have been the
    motors and such reunification, because other proprietary *nix versions are now
    converging for interoperability with Linux; but this unification is probably 15
    to 20 years before it becomes true, unless *nix vendors decide to abandon
    prememtively some "dead" branches to keep only those that users want and are
    ready to learn and support themselves).

    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 17:27:55 EST