On 10/4/2015 5:30 AM, Sean Leonard
wrote:
On
10/3/2015 12:28 PM, Asmus Freytag (t) wrote:
On 10/3/2015 8:15 AM, Sean Leonard wrote:
Thanks.
Well, "DIS 10646" is the Draft International Standard,
particularly Draft 1, from ~1990 or ~1991. (Sometimes it might
have been called 10646.1.) Therefore it would likely only be
in print form (or printed and scanned form). It's pretty old.
What I understand is that Draft 1 got shot down because it was
at variance with the nascent Unicode effort; Draft 2 was
eventually adopted as ISO 10646:1993, and is equivalent to
Unicode 1.1. (10646-1:1993 plus Amendments 5 to 7 = Unicode
2.0.)
Sean,
you never explained your specific interest in this matter.
Personal curiosity? An attempt to write the definite history of
character encoding?
A long time ago, in a galaxy far, far away....
(Okay it really was not that long ago, and it was pretty close at
hand since it was on this list)
The following doesn't really answer my question; the first draft of
10646 seems pretty irrelevant in that context.
However, I do have a small comment on your current project, so I'll
append it here:
I proposed adding C1 Control Pictures to Unicode.
<http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0047.html>
I am resurrecting that effort, but more slowly this time, with
more research and input from implementers. The requirement is that
all glyphs for U+0000 - U+00FF be graphically distinct.
Debuggers used to do this by referencing the graphemes in the
hardware code page, such as Code Page 437, but we have come a long
way from 1981, so displaying ♣ for 0x05 does not make much modern
sense. Merely substituting one of the other legacy code pages in
for 0x80 - 0x9F does not make sense either. The characters of Code
Page 437 overlap with U+00A0 - U+00FF in that range, for example.
(Windows-1252 is somewhat more defensible, but Windows-1252 has 5
unassigned code points so it would be incomplete.)
Totally agree that mapping these to random glyphs from 8-bit sets
that happen to have those positions mapped to printable shapes is
not useful.
But this problem is already solved. Implementers already have
solutions, and they do not depend on encoding anything or making any
other changes. They simply show shapes that somehow contain the
abbreviation for the control code, as in this example showing the
line endings from a random text file:
You can see that the shapes do not actually resemble the existing
control pictures' glyph design although the principle is clearly
related. Also notice that the implementation chooses to use
different techniques for showing whitespace.
Since over the 25 years of the standard none of these implementers
ever approached the consortium with a request for a standard set of
character codes, my conclusion would be that this is a solution in
search of a problem.
The case for most of the original control pictures was very marginal
and grounded, if I recall, in specific legacy implementations of
dumb terminals.
Unlike the old host-terminal interfaces, modern debuggers don't send
character streams to a dumb device. There is a whole rendering
architecture that offers plenty choices for substituting different
shapes for certain code points. All that is taking place on a level
where the actual 'codes' used for that are not shared or visible,
reducing the benefit of standardization. As you can see from my
example, the other benefit of standardization, which is a consensus
around a specific set of shapes, as you would have, for example for
standard math symbols, is also absent, because implementers like to
use different techniques - in my example colored dots, arrows and
the like for whitespace and fat heavy black rounded rectangles with
abbreviations for other control codes. And who knows, the formatting
could change -- many debuggers now let you view text data in
different modes, for example.
A./
Received on Sun Oct 04 2015 - 14:22:13 CDT