Re: UTF-9

From: John Cowan (cowan@mercury.ccil.org)
Date: Thu Oct 30 2003 - 21:36:33 CST


Philippe Verdy scripsit:

> Are there still now platforms where storage bytes are not octets but nonets?
As others have written, the PDP-10/PDP-20 hardware is long obsolete.
However, there are still emulators running on modern 32-bit and 64-bit
hardware. That was the point of the remark about "the number of 36-bit
hosts on the Internet has grown by an order of magnitude", viz. from
about one to about ten.

> This means that the interchange would require to send 2 octets to represent
> each 9-bit byte without loosing data, or to use a complex bit pattern to
> pack sequences of height 9-bit bytes into sequences of nine 8-bit bytes, and
> with a way to interpret the last sequence [...].

A number of such conventions were used on actual PDP-10s/20s, including:
5 octets interpreted as 36 bits with 4 padding bits, 5 octets interpreted
as 32 bits with 4 bits lost, 5 octets interpreted as 5 7-byte characters
with the extra bit either lost or stuffed into the last octet, 6 octets
interpreted as 6 6-bit characters, and 9 octets interpreted as two
consecutive 36-bit words.

> What will happen then to a plain-text coded with UTF-9, and that is sent
> through FTP?

FTP has two modes, text and image. Today this is used to decide whether
to translate line-ends to and from the on-the-wire standard of CR+LF or not,
but on the PDP-10/20 it was used to discriminate between sending 35 bits
in 5 octets (the standard for ASCII text) and sending 72 bits in 9 octets.

-- 
"You know, you haven't stopped talking          John Cowan
since I came here. You must have been           http://www.reutershealth.com
vaccinated with a phonograph needle."           jcowan@reutershealth.com
        --Rufus T. Firefly                      http://www.ccil.org/~cowan


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST