Peter Constable wrote:
> It seems that you're really trying to fight an uphill battle.
I cannot agree to anything more than to this.
> You mentioned the "instant compatibility of non-UTF aware software" as
> a benefit of Doug's encoding (let's call it UTF-Doug). I don't
> understand: How is any program going to correctly handle the string
You are arguing that instant compatibility isn't possible and you seem
to assume that I am somehow brain-damaged. Of course no program will
be able to render Polish, Thai or Devanagari properly if it wasn't
tought to do so. My main point was on ISO Latin-1 backwards
compatibility which could be made to work, if people wanted to. But
you and others didn't want to waste a second thinking towards my
direction, all I hear are quick denials, somewhat ``canned'' surface
level arguments, and the underlying assumption that I would not see
the obvious, i.e. limitations inherent in any backwards compatibility.
My claim was that Unicode is designed to be ISO Latin-1 backwards
compatible and therefore there should be a Latin-1 backwards
compatible UTF. This argument has never been defeated by anyone who
All I have seen (after the P.C. thing was passed) is people mentioning
that ISO Latin-1 would not even work for French or that it misses the
Copyright symbol (BTW, it doesn't, does it?). But that's of course not
the point, because everyone knows that backwards compatibility cannot
do the magic to show the symbol X where noone expected it to do so.
Instant compatibility could be made possible for ISO Latin-1 the same
way it is possible for ASCII. But the uphill problem is, people just
don't want it. Instant compatibility could also work for all languages
1) were supported based on a 128 character ASCII code block + 128
character local code page, ore 256 character local code page (w/o
2) whose local code pages were adopted into Unicode so that the
Unicode position would be OFFSET + OLD_POSITION.
The way this would work is to use virtual code blocks where old
software wouldn't change anything and new software would have two
offset-registers R0 and R1, one for 00-7F and one for 80-FF. These
registers would be preset (configured) to the normal local character
code, so that when text is normally in ASCII+CODE-PAGE-X (Unicode
positions U+xy00 -- U+xy7F, U+xy80 -- U+xyFF), I would set R0 = 0 and
R1 = xyz and read my old text as usual. One would not even need an
escape sequence if only old text is to be read in new software.
If new text is to be read in new software one would define two simple
escape sequences that would allow shifting the register R0 or R1 to
another code page, either for one character, for some short character
sequence, or until the register is schanged again.
If such new text is read in old software the old software would of
course not be able to render the new text, but a low of 95% of
characters will probably use the local default code pages anyway, so
there usually less than 5% of text that remains scrambled in old text
viewers. That's not a big deal. In any way it would be far better than
anything you would get with UTF-8.
Now, may be that Unicode code pages of other than ASCII and Latin-1
are not layed out to be bakcwards compatible with important prior
standards. In this case the whole virtual code blocks and shifting
business may not be well applicable (to anything else than
Latin-1). But that would be not my fault but Unicode's fault. Although
I don't know for sure, I still believe that Unicode code blocks for
other alphabet languages are backwards compatible to prior standards.
But I am not going to continue this uphill battle, I am just a bit sad
that ideas are shot down so quickly, with traditionalist arguments
(Mer bleibe emmer so! Narri! Narro! S'war scho emmer so! Narri!
Gunther Schadow ----------------------------------- http://aurora.rg.iupui.edu
Regenstrief Institute for Health Care
1001 W 10th Street RG5, Indianapolis IN 46202, Phone: (317) 630 7960
email@example.com ---------------------- #include <usual/disclaimer>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT