BTF-8

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Sep 15 1999 - 17:02:47 EDT


Unicoders!

My memory was jogged by Marco Cimarosti's new RFC for ATF-8. I
felt sure that I had heard something similar somewhere.

After a long, diligent search through dusty filing cabinets,
I did indeed discover a very old RFC for BTF-8 that long
predates ATF-8, UTF-8, or WHATEVERTF-8. Because of the still
continuing relevance of Baudot Code to telexes, and because
of the current interest in the invention of TF's, I thought I
should bring it to the attention of the list.

--Ken

=======================================================================

Telegraphy Working Group K. Whistlestop
Request for Comments: 2OLD4U Creed Machinery, Ltd.
Category: Disinformational October 1916

        BTF-8, an 8-bit transformation format of Baudot Code

Status of this Memo

   This memo provides disinformation for the Internet community. This memo
   does not specify an Internet standard of any kind. In fact, if you
   think it specifies any standard, I don't know what you've been smoking
   lately. Distribution of this memo is unlimited.

Abstract

   The Baudot Multiplex System, as codified in the International
   Telegraph Alphabet number 1 [ITA 1] defines a 5 bit character set
   which encompasses one of the world's writing systems (the only one
   that really counts, of course). 5-bit characters, however, are not
   compatible with many current applications and protocols. BTF-8,
   the object of this memo, has the characteristic of preserving the
   full English alphabet range (well, the uppercase, anyway). Letters
   are encoded in one octet, have the usual US-ASCII value (or rather
   what will be the US-ASCII value, when US-ASCII is invented). This
   provides compatibility with telegrams that rely on US-ASCII values
   but are transparent to other values.

1. Introduction

   The Baudot Multiplex System defines a 5 bit character set which
   encompasses 56 characters for the world's most important writing
   system. That's right, you heard me correctly--56 characters. But
   how do they do that, since 5 bits only covers 32 combinations?, you
   might ask. Well, there's nothing up my sleeves, you see--it's all
   done with smoke and mirrors. 26 characters are devoted to uppercase
   letters A-Z. And 26 characters are devoted to "Figures": numbers and
   punctuation, plus a BELL code to wake up the sleeping operator at the
   other end and a "Who are you?" code to check you have reached the correct
   sleeping operator. There are two codes: #31 for LTRS and #27 for FIGS,
   that switch back and forth between the letters codes and the figures
   codes. That leaves four codes for BLANK, SPACE, CR, and LF, which are
   valid for both letters and figures. The LTRS and FIGS encodings, however,
   are hard to use in many current applications and protocols that assume
   8 bit characters without state switches.

   Furthermore, the Baudot Multiplex System, as implemented in Creed
   teleprinter machinery, requires a start bit and 1.5 stop bits, and
   is transmitted asynchronously. Newer systems able to deal with 8 bit
   characters cannot process 7.5 bit asynchronous Baudot Multiplex Codes.
   This situation has led to the development of so-called transformation
   formats (TF), each with different, confusing characteristics.

   BTF-8, the object of this memo, uses all bits of an
   octet, but has the quality of preserving the full US-ASCII range:

   US-ASCII characters are encoded in one octet having the normal US-
   ASCII value, and any octet with such a value can only stand for an
   US-ASCII character, and nothing else.

   LTRS and FIGS codes are removed, and the figures values are recoded
   to their US-ASCII values, so as to avoid stateful switching.

   - US-ASCII values do not appear otherwise in a BTF-8 encoded charac-
      ter stream. This provides compatibility with telegrams or
      filing cabinets that file based on US-ASCII values but are
      transparent to other values.

   - Round-trip conversion is easy between BTF-8 and the Baudot Multiplex
      System.

   - Character boundaries are easily found from anywhere in an octet
      stream.

   - The lexicographic sorting order of Baudot Multiplex System strings
      is mucked up beyond belief. Of course this is of limited interest
      since the sort order is not culturally valid in either case. (And I'm
      not sure anybody has even tried to sort asynchronous character streams
      on Creed teleprinters, but that is another story anyway.)

   - The octet values FE and FF never appear. But then, neither do the
      octet values 5B..FD, so it isn't clear why we should single out FE
      and FF, is it?

2. BTF-8 definition

   In BTF-8, characters are encoded using a single octet. What could be
   simpler? The letters are recoded according to their value in US-ASCII.
   The figures and control codes are recoded according to their value
   in US-ASCII. The LTRS and FIGS codes are tossed in the bit bucket.

   The table below summarizes this format.
   The letter x indicates bits available for encoding bits of the Baudot
   character value.

   Baudot Multiplex Code (binary) BTF-8 octet sequence (binary)
   00000-11111 0xxxxxxx

   Encoding from the Baudot Multiplex System to BTF-8 proceeds as follows:

   1) Assume "letters" state initially.

   2) Process the asynchronous stream of Baudot Multiplex codes sequentially,
      stripping 1 start bit and 1.5 stop bits, to obtain the 5-bit coded value.

   3) When the FIGS code is encountered, set the state to "figures".

   4) When the LTRS code is encountered, set the state to "letters".

   5) For all other codes encountered, if in "letters" state, convert to
      US-ASCII with the LETTER_TO_BTF8 table, otherwise convert to
      US-ASCII with the FIGURE_TO_BTF8 table.

      Decoding from BTF-8 to the Baudot Multiplex System proceeds as follows:

   1) Assume "letters" state initially.

   2) For each character in the BTF-8 string, determine whether it is
      in the letters set or the figures set.

   3) If the character is in the letters set and "letters" state is set,
      convert to Baudot code with the BTF8_TO_LETTER table.

   4) If the character is in the figures set and "figures" state is set,
      convert to Baudot code with the BTF8_TO_FIGURE table.

   5) If the character is in the letters set and "figures" state is set,
      first emit the FIGS code and then
      convert to Baudot code with the BTF8_TO_LETTER table.

   6) If the character is in the figures set and "letters" state is set,
      first emit the LTRS code and then
      convert to Baudot code with the BTF8_TO_FIGURE table.

   7) Emit each converted 5-bit value, prefixing a start bit and 1.5
      stop bits.

   The applicable tables are shown here, expressed in C (= Baudot Code
   01110). The value 0xFF is an unused value in the table, corresponding
   to the LTRS or FIGS codes, or illegal values in BTF-8.

   char LETTER_TO_BTF8 [32] =
     { 0x00, 0x45, 0x0A, 0x41, 0x20, 0x53, 0x49, 0x55,
       0x0D, 0x44, 0x52, 0x4A, 0x4E, 0x46, 0x43, 0x4B,
       0x54, 0x5A, 0x4C, 0x57, 0x48, 0x59, 0x50, 0x51,
       0x4F, 0x42, 0x47, 0xFF, 0x4D, 0x58, 0x56, 0xFF };

   char FIGURE_TO_BTF8 [32] =
     { 0x00, 0x33, 0x0A, 0x3D, 0x20, 0x27, 0x38, 0x37,
       0x0D, 0x05, 0x34, 0x07, 0x2C, 0x40, 0x3A, 0x28,
       0x35, 0x2B, 0x29, 0x32, 0x24, 0x36, 0x30, 0x31,
       0x39, 0x3F, 0x2A, 0xFF, 0x2E, 0x2F, 0x3E, 0xFF };

   char BTF8_TO_LETTER [91] =
     { 0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, /* 0 */
       0xFF, 0xFF, 0x02, 0xFF, 0xFF, 0x08, 0xFF, 0xFF,
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, /* 1 */
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
       0x04, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, /* 2 */
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, /* 3 */
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
       0xFF, 0x04, 0x19, 0x0E, 0x09, 0x01, 0x0D, 0x1A, /* 4 */
       0x14, 0x06, 0x0B, 0x0F, 0x12, 0x1C, 0x0C, 0x18,
       0x16, 0x17, 0x0A, 0x05, 0x10, 0x07, 0x1E, 0x13, /* 5 */
       0x1D, 0x15, 0x11 };

   char BTF8_TO_FIGURE [65] =
     { 0x00, 0xFF, 0xFF, 0xFF, 0x09, 0xFF, 0xFF, 0x0B, /* 0 */
       0xFF, 0xFF, 0x02, 0xFF, 0xFF, 0x08, 0xFF, 0xFF,
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, /* 1 */
       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
       0x04, 0xFF, 0xFF, 0xFF, 0x14, 0xFF, 0xFF, 0x05, /* 2 */
       0x0F, 0x12, 0x1A, 0x11, 0x0C, 0xFF, 0x1C, 0x1D,
       0x16, 0x17, 0x13, 0x01, 0x0A, 0x10, 0x15, 0x07, /* 3 */
       0x06, 0x18, 0x0E, 0xFF, 0xFF, 0x04, 0x1E, 0x19,
       0x0D };

   Actual code to strip the start and stop bits of the asynchronous
   stream, convert the 5-bit Baudot code thus extracted to a numeric
   value, and then to use these tables is left as an exercise to the
   reader.

3. Examples

   For simplicity these examples omit the start bit (always set) and the
   1.5 stop bits (also always set). Note that bit values in the Baudot
   Codes start with the lowest-order bit on the left, and with higher-order
   bits to the right, so that "11000" = 3, the Baudot Code for "A".

   In case you haven't purchased your Creed transmitting and teleprinting
   devices yet, this arrangement used to correspond to the five levers
   the operator pressed on a chording keyboard (see [ITA-1] for a
   photograph): the two on the left
   corresponding to the first two fingers of the left hand, and the three
   on the right corresponding to the first three fingers of the right hand.
   However, this has all been simplified in the Creed machines to make
   use of an ordinary typewriter-style keyboard--the machine automatically
   translates a keypress into the activation of the appropriate combination
   of levers for perforating tape, controlled by compressed air!!
   Now anyone who has passed a competent secretarial course
   can serve as a telegraph operator, thus opening the door to hiring
   cheap, compliant female labor to keep your telegraphy operating costs down.

   The Baudot sequence "A=1." (11000 11011 01111 11101 00111) may be encoded
   as follows:

      41 3D 31 2E

   The Baudot sequence "HI MOM :-)" (00101 01100 00100 00111 00011 00111
   00100 11011 01110 11000 01001) may be encoded as follows:

      48 49 20 4D 4F 4D 20 3A 3D 29

   The Baudot sequence representing the Han characters for the Japanese
   word "nihongo" -- no wait!, what could I be thinking??

MIME registrations

   This memo is meant to serve as the basis for registration of a MIME
   character encoding (charset) as per [RFC1521]. The proposed charset
   parameter value is "BTF-8". This string would label media types
   containing text consisting of characters from the repertoire of ITA 1
   encoded to a sequence of octets using the encoding scheme
   outlined above.

Security Considerations

   Security issues are not discussed in this memo. German spies may be
   listening, and we all know what an Enigma their codes and coding
   machinery are.

Acknowledgments

   The following have participated in the drafting and discussion of
   this memo:

      Dewey, Cheetham, and Howe My Dog Fluffy
      Phillip Airtime Sy Burnett
      Tilly Graham

Bibliography

   [ITA 1] International Telegraph Alphabet number 1. For nice
                  pictures of equipment, see:

   http://ourworld.compuserve.com/homepages/sam_hallas/telhist2/telehist.htm

   [RFC1521] Borenstein, N., and N. Freed, "MIME (Multipurpose
                  Internet Mail Extensions) Part One: Mechanisms for
                  Specifying and Describing the Format of Internet Mes-
                  sage Bodies", RFC 1521, Bellcore, Innosoft, September
                  1993.

   [US-ASCII] Coded Character Set--7-bit American Standard Code for
                  Information Interchange, ANSI X3.4-1986.

Author's Address

      Ken Whistlestop
      Creed Machinery, Ltd.

      Tel: Garfield exchange #42
      Fax: Same to you, buddy!
      EMail: What's that?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:52 EDT