RE: New Corrigendum to The Unicode Standard

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 21 2007 - 22:26:01 CDT

  • Next message: Philippe Verdy: "RE: New Corrigendum to The Unicode Standard"

    I was not making a formal proposal, just proposing something to help
    referring to the version with corrigendums applied. OK the letter "d" is
    used for drafts, but drafts don't need to be referred to for long term,
    that's not the case for compliant implementations.

    Choose letter c (like corrigendum) if you prefer or an extra dot.
    So the current version would be 5.0.0c6 or 5.0.0.6.

    De Asmus Freytag
    > The Unicode Standard (and website) make very clear that a corrigendum
    > does not actually modify a version. It also doesn't supercede a version.
    > What it does, is to allow implementers to claim conformance to a version
    > with the corrigendum applied.

    It does when it modified character properties, because you can't comply to
    both the base version and the version with the corrigendum, unless:

            (1) you accept to NOT use the characters whose properties have
    changed (and in that case this hasa global effet on ALL past versions ; or

            (2) you don't use Unicode 5 and keep with Unicode 4 only (something
    that is certainly not desirable for the long term)

    As the compliance level will be needed here not only for renderers (the way
    they handled the BiDi algorithm and reorder and mirror characters), but also
    the applications that may generate text for the intended rendering, you xan
    say what you want, but this change means that Unicode 5.0 without the
    corrigendum is not compliant with the previous versions for these characters
    whose properties were changed in an incompatible way before being corrected.

    The Bidi algorithm is normative, it is a Standard Annex, and its intro
    explicitly says:
            "A Unicode Standard Annex (UAX) forms an integral part of the
    Unicode Standard(...)"

    This means that the base version is deprecated, even if it was published,
    and that implementing Unicode without the corrigendums when you know they
    exist should not be recommended. But anyway, you'll still need to implement
    it according to some version for which there's still no corrigendum; if you
    comply with it, your implementation may become incompatible with a future
    corrigendum. That's why I suggest being able to refer to Unicode versions
    with or without corrigendums explicitly.

    I mean here: Unicode 5.0.0c0 for the version without the corrigendums and
    Unicode 5.0.0.c6 for the current version with 6 corrigendums applied.

    The versions in the 5.0 family have of course a very large common ground,
    but complying to all of the family members at the same time means dropping
    the support for the characters whose properties have changed (as if they
    were undefined characters in all version of Unicode before the corrigendum).
    If you want to comply to the whole set of Unicode versions, in a upward
    compatible way, the common subset of supported characters will be reduced
    even more.

    Note that under this rule, this means that the Unicode 5.0 change created
    the reduced subset, it's not the corrigendum itself that is doing that
    because it attempts to restore the compatibility with past versions; however
    corrigendums are not upward compatible with uncorrected versions, instead
    what they do is do move effectively a past version (or version with prior
    corrigendums) into a separate branch, out of the versions trunk.

    It's a matter of logic; compliance level should not be made fuzzy.

    So I mean this compliance graph tree (partial) which is distinct from the
    historic tree, because the branches out of the trunk are in a reversed
    order, where each branch from the trunk makes an incompatible change, and
    elements at the lowest position in a branch are the most compatible with the
    trunk (the vertical link between them is in fast describing two separate
    branches because they are also containing mutually incompatible
    differences):

    (Latest)
      ||
      || 5.0.0c0 (the Unicode 5.0 book, uncorrected)
      || |
      || 5.0.0c1
      || |
      || 5.0.0c2
      || |
      || 5.0.0c3
      || |
      || 5.0.0c4
      || |
      || 5.0.0c5
      || |
      |+-----+
      ||
    5.0.0c6 (Unicode 5.0, with all corrigenda)
      ||
    4.1.0c0 (Unicode 4.1, has no corrigendum)
      ||
      || 4.0.1c0 (Unicode 4.0.1 uncorrected)
      || |
      || 4.0.1c1
      || |
      || 4.0.1c2
      || |
      || 4.0.1c3
      || |
      || 4.0.1c4
      || |
      |+-----+
      ||
    4.0.1c5 (Unicode 4.0.1, with all corrigenda)
      ||
    4.0.0c0 (The Unicode 4.0 book, has no corrigendum)
      ||
      || 3.2.0c0 (Unicode 3.2, uncorrected)
      || |
      || 3.2.0c1
      || |
      || 3.2.0c2
      || |
      || 3.2.0c3
      || |
      |+-----+
      ||
    3.2.0c4 (Unicode 3.2, with all corrigenda)
      ||
      || 3.1.1c0 (Unicode 3.1.1, uncorrected)
      || |
      || 3.1.1c1
      || |
      || 3.1.1c2
      || |
      |+-----+
    3.1.1c3 (Unicode 3.1.1, with all corrigenda)
      || ||
    3.1.0c0 (Unicode 3.1.0, has no corrigendum)
      ||
    3.0.1c2 (Unicode 3.0.1, with all corrigenda)
      ||
      || 3.0.1c0 (Unicode 3.0.1, uncorrected)
      || |
      || 3.0.1c1
      || |
      |+-----+
      ||
    3.0.0c0 (The Unicode 3.0 book, has no corrigendum)
      ||
    2.1.9c0 (Unicode 2.1.9, has no corrigendum)
      ||
     (...) (intermediate versions have no corrigendum)
      ||
    2.0.0c0 (The Unicode 2.0 book, has no corrigendum)
      ||
    1.1.5c0
      ||
    1.1.0c0
      ||
    1.0.1c0
      ||
      || 1.0.0c0 (The Unicode 1.0 book, has no corrigendum)
      || |
      |+-----+
      ||
    (Root) (no version assigned, common part between 1.0.1c0 and 1.0.0)

    I've not verified completely this tree, there may exist some other
    incompatibilities between two successive versions in the trunk, in which
    case there would be other branches.



    This archive was generated by hypermail 2.1.5 : Tue Aug 21 2007 - 22:30:56 CDT