RE: Proposed Draft UTR #31 - Syntax Characters

From: Jim Allan (
Date: Wed Aug 27 2003 - 10:09:42 EDT

  • Next message: "Prophesising (was Proposed Draft...)"

    Jill Ramonsky posted:

    > In any case, I _imagine_ that a future compiler, running on a future
    > operating system, will contain a system directory which will contain A
    > VERSION of Unicode - by which I mean A VERSION of the Unicode data files, as
    > supplied by the consortium. The hypothetical OS will then parse said files
    > into an internal form that only it needs to know about, and make Unicode
    > functionality available to applications (such as future compilers) in the
    > form of standard API calls. A future compiler will simply have to call some
    > function, which may be called something like is_indentifier_char(), and act
    > on the return value (true or false) accordingly. The behaviour of the
    > compiler, and indeed the whole OS, can be upgraded to behave in accordance
    > with a new version of Unicode, simply by storing the new data files in the
    > right place. You will not need to get upgraded applications. You will not
    > need to recompile the kernel. Thus, in this future system, one will indeed
    > "store a version of Unicode on your machine"

    That seems very dangerous.

    Such behavior is why people are often very wary of upgrading.

    But it is dubious that a Unicode upgrade will modify all fonts on the
    system to add new characters in the proper style, modify any and all
    translation tables to use them, modify all tailored sorts, change
    spell-checkers to recognize new valid spellings, change translation
    tables to other character encodings, modify legacy data to fit with the
    new version of Unicode.

    Unicode is a standard which operating systems and applications and fonts
      and sort routines can use and tailor.

    A publishing house printing Coptic and using Unicode to do so would
    currently employ the mixed Greek/Coptic characters defined in Unicode
    but presumably with fonts that handled diacritics in Coptic fashion
    rather than Greek fashion and probably using a number of PUA characters.

    When the planned disunification of Greek and Coptic is implemented the
    addition of new files indicating new properties for some Unicode code
    points and new sort weights is not going to be sufficient to switch the
    entire printing and publishing operation to use the new characters.

    For at least a few years the publishing house will be still using the
    old system while it cautiously edges into the new encoding for new work
      and gradually updates fonts and acquires new ones and converts legacy

    Single character additions here and there to Unicode and clarification
    of rules will not have so drastic and effect. But they will have effects
    that cannot be implemented immediately through the mere addition of
    tables of properties and collating weights.

    The effect of the addition of four new characters in Unicode 3.0 for use
    in Romanian text is still being felt in the lack of fonts that support
    them and uncertainly about what translation tables should do.

    It may be specified that a particular control character should not be
    treated in the way some fonts have been treating it. That is not going
    to change the behavior of legacy fonts. If it did, the user would be
    quite perturbed that a document which printed perfectly yesterday prints
      in a flawed manner today.

    A particular proprietary routine coded by a language may depend a
    particular character being in a particular Unicode classification to
    filter it out along with certain other characters. It would at least be
    annoying and might be disastrous if this behavior changed without
    warning because the properties of the character had changed.

    A simple change in compatibility decomposition might have great
    individual effect on a single routine.

    To write routines that depend on properties that Unicode has announced
    as changeable may be bad coding. But I don't see that applications in
    the future will be any less afflicted with bad coding than current

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Wed Aug 27 2003 - 11:02:40 EDT