Re: Acquiring DIS 10646

From: Asmus Freytag (t) <>
Date: Sun, 4 Oct 2015 12:21:11 -0700
On 10/4/2015 5:30 AM, Sean Leonard wrote:
On 10/3/2015 12:28 PM, Asmus Freytag (t) wrote:
On 10/3/2015 8:15 AM, Sean Leonard wrote:

Well, "DIS 10646" is the Draft International Standard, particularly Draft 1, from ~1990 or ~1991. (Sometimes it might have been called 10646.1.) Therefore it would likely only be in print form (or printed and scanned form). It's pretty old. What I understand is that Draft 1 got shot down because it was at variance with the nascent Unicode effort; Draft 2 was eventually adopted as ISO 10646:1993, and is equivalent to Unicode 1.1. (10646-1:1993 plus Amendments 5 to 7 = Unicode 2.0.)


you never explained your specific interest in this matter. Personal curiosity? An attempt to write the definite history of character encoding?

A long time ago, in a galaxy far, far away....

(Okay it really was not that long ago, and it was pretty close at hand since it was on this list)

The following doesn't really answer my question; the first draft of 10646 seems pretty irrelevant in that context.
However, I do have a small comment on your current project, so I'll append it here:

I proposed adding C1 Control Pictures to Unicode. <> I am resurrecting that effort, but more slowly this time, with more research and input from implementers. The requirement is that all glyphs for U+0000 - U+00FF be graphically distinct.

Debuggers used to do this by referencing the graphemes in the hardware code page, such as Code Page 437, but we have come a long way from 1981, so displaying ♣ for 0x05 does not make much modern sense. Merely substituting one of the other legacy code pages in for 0x80 - 0x9F does not make sense either. The characters of Code Page 437 overlap with U+00A0 - U+00FF in that range, for example. (Windows-1252 is somewhat more defensible, but Windows-1252 has 5 unassigned code points so it would be incomplete.)

Totally agree that mapping these to random glyphs from 8-bit sets that happen to have those positions mapped to printable shapes is not useful.

But this problem is already solved. Implementers already have solutions, and they do not depend on encoding anything or making any other changes. They simply show shapes that somehow contain the abbreviation for the control code, as in this example showing the line endings from a random text file:

You can see that the shapes do not actually resemble the existing control pictures' glyph design although the principle is clearly related. Also notice that the implementation chooses to use different techniques for showing whitespace.

Since over the 25 years of the standard none of these implementers ever approached the consortium with a request for a standard set of character codes, my conclusion would be that this is a solution in search of a problem.

The case for most of the original control pictures was very marginal and grounded, if I recall, in specific legacy implementations of dumb terminals.

Unlike the old host-terminal interfaces, modern debuggers don't send character streams to a dumb device. There is a whole rendering architecture that offers plenty choices for substituting different shapes for certain code points. All that is taking place on a level where the actual 'codes' used for that are not shared or visible, reducing the benefit of standardization. As you can see from my example, the other benefit of standardization, which is a consensus around a specific set of shapes, as you would have, for example for standard math symbols, is also absent, because implementers like to use different techniques - in my example colored dots, arrows and the like for whitespace and fat heavy black rounded rectangles with abbreviations for other control codes. And who knows, the formatting could change -- many debuggers now let you view text data in different modes, for example.

Received on Sun Oct 04 2015 - 14:22:13 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 04 2015 - 14:22:13 CDT