RE: New Corrigendum to The Unicode Standard

From: Peter Constable (petercon@microsoft.com)
Date: Wed Aug 22 2007 - 09:24:16 CDT

Next message: Daniel Ehrenberg: "UAX 29"

Previous message: Philippe Verdy: "RE: Apostrophes at www.unicode.org"
In reply to: Philippe Verdy: "RE: New Corrigendum to The Unicode Standard"
Next in thread: Philippe Verdy: "RE: New Corrigendum to The Unicode Standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Unicode has three levels of versioning. Corregenda do not cause a new version number; they simply declare a correction to the documentation of a given version.

Peter

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Philippe Verdy
Sent: Tuesday, August 21, 2007 8:26 PM
To: 'Asmus Freytag'
Cc: unicode@unicode.org
Subject: RE: New Corrigendum to The Unicode Standard

I was not making a formal proposal, just proposing something to help
referring to the version with corrigendums applied. OK the letter "d" is
used for drafts, but drafts don't need to be referred to for long term,
that's not the case for compliant implementations.

Choose letter c (like corrigendum) if you prefer or an extra dot.
So the current version would be 5.0.0c6 or 5.0.0.6.

De Asmus Freytag
> The Unicode Standard (and website) make very clear that a corrigendum
> does not actually modify a version. It also doesn't supercede a version.
> What it does, is to allow implementers to claim conformance to a version
> with the corrigendum applied.

It does when it modified character properties, because you can't comply to
both the base version and the version with the corrigendum, unless:

(1) you accept to NOT use the characters whose properties have
changed (and in that case this hasa global effet on ALL past versions ; or

(2) you don't use Unicode 5 and keep with Unicode 4 only (something
that is certainly not desirable for the long term)

As the compliance level will be needed here not only for renderers (the way
they handled the BiDi algorithm and reorder and mirror characters), but also
the applications that may generate text for the intended rendering, you xan
say what you want, but this change means that Unicode 5.0 without the
corrigendum is not compliant with the previous versions for these characters
whose properties were changed in an incompatible way before being corrected.

The Bidi algorithm is normative, it is a Standard Annex, and its intro
explicitly says:
"A Unicode Standard Annex (UAX) forms an integral part of the
Unicode Standard(...)"

This means that the base version is deprecated, even if it was published,
and that implementing Unicode without the corrigendums when you know they
exist should not be recommended. But anyway, you'll still need to implement
it according to some version for which there's still no corrigendum; if you
comply with it, your implementation may become incompatible with a future
corrigendum. That's why I suggest being able to refer to Unicode versions
with or without corrigendums explicitly.

I mean here: Unicode 5.0.0c0 for the version without the corrigendums and
Unicode 5.0.0.c6 for the current version with 6 corrigendums applied.

The versions in the 5.0 family have of course a very large common ground,
but complying to all of the family members at the same time means dropping
the support for the characters whose properties have changed (as if they
were undefined characters in all version of Unicode before the corrigendum).
If you want to comply to the whole set of Unicode versions, in a upward
compatible way, the common subset of supported characters will be reduced
even more.

Note that under this rule, this means that the Unicode 5.0 change created
the reduced subset, it's not the corrigendum itself that is doing that
because it attempts to restore the compatibility with past versions; however
corrigendums are not upward compatible with uncorrected versions, instead
what they do is do move effectively a past version (or version with prior
corrigendums) into a separate branch, out of the versions trunk.

It's a matter of logic; compliance level should not be made fuzzy.

So I mean this compliance graph tree (partial) which is distinct from the
historic tree, because the branches out of the trunk are in a reversed
order, where each branch from the trunk makes an incompatible change, and
elements at the lowest position in a branch are the most compatible with the
trunk (the vertical link between them is in fast describing two separate
branches because they are also containing mutually incompatible
differences):

(Latest)
  ||
  || 5.0.0c0 (the Unicode 5.0 book, uncorrected)
  || |
  || 5.0.0c1
  || |
  || 5.0.0c2
  || |
  || 5.0.0c3
  || |
  || 5.0.0c4
  || |
  || 5.0.0c5
  || |
  |+-----+
  ||
5.0.0c6 (Unicode 5.0, with all corrigenda)
  ||
4.1.0c0 (Unicode 4.1, has no corrigendum)
  ||
  || 4.0.1c0 (Unicode 4.0.1 uncorrected)
  || |
  || 4.0.1c1
  || |
  || 4.0.1c2
  || |
  || 4.0.1c3
  || |
  || 4.0.1c4
  || |
  |+-----+
  ||
4.0.1c5 (Unicode 4.0.1, with all corrigenda)
  ||
4.0.0c0 (The Unicode 4.0 book, has no corrigendum)
  ||
  || 3.2.0c0 (Unicode 3.2, uncorrected)
  || |
  || 3.2.0c1
  || |
  || 3.2.0c2
  || |
  || 3.2.0c3
  || |
  |+-----+
  ||
3.2.0c4 (Unicode 3.2, with all corrigenda)
  ||
  || 3.1.1c0 (Unicode 3.1.1, uncorrected)
  || |
  || 3.1.1c1
  || |
  || 3.1.1c2
  || |
  |+-----+
3.1.1c3 (Unicode 3.1.1, with all corrigenda)
  || ||
3.1.0c0 (Unicode 3.1.0, has no corrigendum)
  ||
3.0.1c2 (Unicode 3.0.1, with all corrigenda)
  ||
  || 3.0.1c0 (Unicode 3.0.1, uncorrected)
  || |
  || 3.0.1c1
  || |
  |+-----+
  ||
3.0.0c0 (The Unicode 3.0 book, has no corrigendum)
  ||
2.1.9c0 (Unicode 2.1.9, has no corrigendum)
  ||
(...) (intermediate versions have no corrigendum)
  ||
2.0.0c0 (The Unicode 2.0 book, has no corrigendum)
  ||
1.1.5c0
  ||
1.1.0c0
  ||
1.0.1c0
  ||
  || 1.0.0c0 (The Unicode 1.0 book, has no corrigendum)
  || |
  |+-----+
  ||
(Root) (no version assigned, common part between 1.0.1c0 and 1.0.0)

I've not verified completely this tree, there may exist some other
incompatibilities between two successive versions in the trunk, in which
case there would be other branches.

Next message: Daniel Ehrenberg: "UAX 29"
Previous message: Philippe Verdy: "RE: Apostrophes at www.unicode.org"
In reply to: Philippe Verdy: "RE: New Corrigendum to The Unicode Standard"
Next in thread: Philippe Verdy: "RE: New Corrigendum to The Unicode Standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 22 2007 - 09:27:09 CDT