L2/02-403 Subject: Support for the G1 character sets used in Videotex and Teletext Date/Time: Wed Nov 6 00:36:49 EST 2002 Contact: verdy_p@wanadoo.fr Support for the G1 character sets used in Videotex and Teletext. You have added some new symbol blocks in the Miscellaneous Technical, to emulate old terminals that are mostly no more used, in addition to the graphic blocks used in IBM PC OEM character sets. However, there is still no encoding for the G1 character sets defined in Videotex and Teletext, and still heavily used (notably in France for the Minitel, and for information pages broadcasted within the dead top and bottom parts of the TV signal and accessed either from a decoder or directly on most modern TV sets) The G1 character sets have two styles, infered by the presence of the "underlining" style markup, but they share a common structure: each character cell is divided in two equal sized columns, and three approximately equal sized rows (the top and bottom rows have equal heights, and the middle row way have a height difference of at most 1 pixel). The first set uses conjoined blocks, the second set uses disjoined blocks with at least 1 pixel of background color on the left or bottom sides. Given that Unicode cannot encode the "underlining" style, the G1 character set can only be represented in fonts using separate characters (not necessarily two sets of code points, because Unicode already supports 16 Variant Selectors for glyphs), assuming that the Videotex or Teletext decoder will map the combination of character plus styles into appropriate characters in fonts, by using the Variant Selector 1 code to specify the underlined variant in the terminal font, or choosing another font for the underlined style. The same handling could also be performed in word processors or on web pages with HTML markup used to convey a Videotex or Teletex content rendered with a monospace font such as Courrier. Each of these two alternate graphic sets include 64 combinations, to take into account the presence or absence of marking of one of each of the 6 graphic sub-blocks. In Videotex or Teletext, they are encoded by enabling the graphic mode and encoding separately the styles for "underlining", background and foreground colors, by ASCII code values, using the 7 low-order bits of the code value to signal the marking of the corresponding sub-block with the following pattern: [0][1] [2][3] [4][6] where the digit in each sub-block indicates the bit in the code value that must be set to 1 to display a mark, so they can be numerically ordered by an offset value between 33 and 126, and computed algorithmically to draw low-resolution images or logos. Most terminals also accept to decode the offset code values 32 and 127, displaying the same graphic character independantly of the value of bit 5 in the encoding code value. These graphic characters have a half-width size (in terms of East Asian width properties) and occupy the whole character cell. These characters must not be enlarged or narrowed by the rendering engine, as they must be aligned both horizontally and vertically on the terminal display grid. They are designed to be displayed on low resolution monitors or TV sets (typically on a 320x250 screen, divided in 25 rows of 40 columns, each cell being 8 pixels wide and 10 pixels tall, with a non isotropic resolution as the cell height is typically about 133% to 150% of the cell width). The offset value 0 is similar to a W-width space. The offset value 63 is sometimes called "obliteration" but is a full black box only in the non-underlined style, else it consists in 6 visibly distinct rectangular blocks, with thin separations. Of the possible graphic characters that are already encoded in Unicode, only some of these graphic characters are encoded within compatibility symbols: - The empty background glyph (identical in both underlined and non-underlined styles) - The full foreground glyph (currently only encoded with the Unicode standard 3.2 with the non-underlined style) - The left and right half blocks (currently only encoded with the Unicode standard 3.2 with the non-underlined style) This means that on the 128 required combinations to support the G1 character set used in Videotex and Teletext, only 5 characters are currently encoded with Unicode. There should be a way to encode with Unicode the 64 combinations, and use the variant selector to specify the alternate (non-underlined) glyphs when referencing characters in fonts. This way, fonts for Videotex and Teletext could be used within a text document to specify these graphic blocks. Windows 95 and later already supports a legacy monospaced Arial font used in HyperTerminal to emulate the French Minitel terminal. Some other terminal applications use this font or provide their own ones with various sizes to emulate the Minitel, or to consult Teletext pages on broadcasted TV programs. Generally, unification with all the European (CEPT) variants of Videotex and Teletext standards (mostly in France, Germany, and UK) should be added in Unicode, in addition to the existing support for the Japanese industry standards for which Unicode has accepted all its graphic and technical characters. This would help the transition from the slowly deprecating Videotex applications to the web, by allowing access from the web with to contents already accessible only with Minitel-like terminals or Teletext decoders, while allowing servers to share their content on both environments (notably when converting logos for the web or for terminal applications). -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --