Re: Code pages and Unicode

From: Asmus Freytag <>
Date: Wed, 24 Aug 2011 22:59:39 -0700

On 8/24/2011 7:45 PM, Richard Wordingham wrote:
> Which earlier coding system supported Welsh? (I'm thinking of 'W WITH
> CIRCUMFLEX', U+0174 and U+0175.) How was the use of the canonical
> decompositions incompatible with the character encodings of legacy
> systems? Latin-1 has the same codes as ISO-8859-1, but that's as far
> as having the same codes goes. Was the use of combining jamo
> incompatible with legacy Hangul encodings?

See, how time flies.

Early adopters were interested in 1:1 transcoding, using a single 256
entry table for an 8-bit character set, with guaranteed predictable
length. Early designs of Unicode (and 10646) attempted to address these
concerns, because they promised severe impediments to migration.

Some characters were included as part of the merger, without the same
rigorous process as is in force for characters today. At that time,
scuttling the deal over a few characters here or there would not have
been a reasonable action. So you will always find some "exceptions" to
many of the principles - which doesn't make them less valid.

> Obviously <D800 D800 000E DC00> is non-conformant with current UTF-16.
> Remembering that there is a guarantee that there will be no more
> surrogate points, an extension form has to be non-conformant with
> current UTF-16!

And that's the reason why there's no interest in this part of the
discussion. Nobody will need an extension next Tuesday, or in a decade
or even in several decades - or ever. Haven't seen an upgrade to Morse
code recently to handle Unicode, for example. Technology has a way of
moving on.

So, best thing is to drop this silly discussion, and let those future
people that might be facing a real *requirement* use their good judgment
to come to a technical solution appropriate to their time - instead of
wasting collective cycles of discussion how to make 1990's technology
work for an unknown future requirement. It's just bad engineering.
> Everyone should know how to extend UTF-8 and UTF-32 to cover the 31-bit
> range.

I disagree (as would anyone with a bit of long-term perspective). Nobody
needs to look into this for decades, so let it rest.

Received on Thu Aug 25 2011 - 01:03:34 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 25 2011 - 01:03:35 CDT