[OT] character collection for an international keyboard layout (ISO/IEC 9995-2 and 9995-3)

From: Karl Pentzlin (karl-pentzlin@acssoft.de)
Date: Wed Dec 19 2007 - 13:46:59 CST

  • Next message: Michael Everson: "Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"

    As a (new) member of DIN NA 043-01-35-01 GAK, a German group
    related to ISO/IEC JTC 1/SC 35/WG 1, I am concerned with the
    ISO/IEC 9995 standard, Keyboard layouts for text and office systems.

      Before I do any detailed statements in the standard group, I want to
      discuss my general ideas in the public.
      Any comments or hints are welcome.

    In the current version of Part 2 of the aforementioned ISO 9995 is stated:
      For the input of graphic character repertoire of collection 281
      (titled MES-1) as specified in amendment 1 to ISO/IEC 10646:1-2000,
      a Common Secondary Group Layout (to be used as group 2) is specified
      in ISO/IEC 9995-3.

    The collection 281 is:
    U+00..: 20-7E A0-FF
    U+01..: 00-13 16-2B 2E-4D 50-7E
    U+02..: C7 D8-DB DD
    U+20..: 15 18-19 1C-1D AC
    U+21..: 22 26 5B-5E 90-93
    U+26..: 6A

    In my opinion, this character collection is not suited as base
    for a standardized keyboard layout, for the following reasons:
    - The collection 281 is based of the ISO/IEC 6937, which was developed in the
      1970s for "telematic services", i.e. for communication purposes like
      the long forgotten Telex successor "Teletex". It was not its primary
      goal to act as a well thought set for an international keyboard.
    - Moreover, some characters of it are obsolete legacy today which
      should not burden an actual keyboard design.
    - The last 30 years yielded the need for some more characters (e.g.,
      the introduction of the Latin alphabet in Azerbaijan revived the Jaŋalif
      character Ə/ə).
    - Additionally, the set is defective (e.g., it contains the characters
      Ŋ/ŋ, Ŧ/ŧ and Đ/đ for Sámi, but not Ʒ/ʒ and Ǯ/ǯ).
    - As the name "MES-1" ("Multilingual European Subset 1") suggests, the
      larger part of the world is not considered (especially Vietnamese, but
      also most "minority languages" even if they write Latin).

    Thus, I propose to use the more complete set of collection 282
    (MES-2) of ISO 10646, with some modifications as enumerated below.
    It is of course a somewhat complicated task to put these characters into
    a concise keyboard design, but the "Europatastatur" (European Keyboard
    as shown on http://www.europatastatur.de , in German; an English
    presentation is found on http://www.europatastatur.de/presentation1/ )
    shows that such things can be done.

    The collection 282 (MES-2) is:
    U+00..: 20-7E A0-FF
    U+01..: 00-7F 8F 92 B7 DE-EF FA-FF
    U+02..: 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE
    U+03..: 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1
    U+04..: 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9
    U+1E..: 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3
    U+1F..: 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB
            DD-EF F2-F4 F6-FE
    U+20..: 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF
    U+21..: 05 16 22 26 5B-5E 90-95 A8
    U+22..: 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95
    U+23..: 02 10 20-21 29-2A
    U+25..: 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 BA BC
            C4 CA-CB D8-D9
    U+26..: 3A-3C 40 42 60 63 65-66 6A-6B
    U+FB..: 01-02
    U+FF..: FD

    I propose to use a set based on collection 282, without Greek and
    Cyrillic (i.e. Latin script only), without block graphics (as their use
    is very limited nowadays), and without any legacy ballast, namely
      U+0132/U+0133 LATIN capital/small LIGATURE IJ (obsolete; ISO 6937 legacy),
      U+013F/U+0140 LATIN capital/small LETTER L WITH MIDDLE DOT (ISO 6937 legacy)
      U+0370...U+03FF (Greek), except U+03A9 Ω as Ohm sign according to Unicode
      U+0400...U+04FF (Cyrillic),
      U+1F00...U+1FFF (Greek extended),
      U+20AF DRACHMA SIGN (obsolete Greek Currency sign),
      U+2126 OHM SIGN (use U+03A9 Ω instead, as preferred by Unicode)
      U+2320...U+2321 (half integrals; kind of block graphics)
      U+2329...U+232A left/right-POINTING ANGLE BRACKET (these code points have
        a canonical equivalence to CJK characters and are therefore designed
        to be displayed in a Chinese character square cell).
        — Use U+27E8/U+27E9 instead.
      U+2500...U+257F (Box Drawing)
      U+2580...U+259F (Block Elements)
      U+25D8...U+25D9 (inverse bullets, kind of block graphics)
      U+FB01...U+FB02 (precomposed ligatures, apparently Mac charset legacy)

    On the other side, I propose to add to this set the following:

    - Substitutes for discouraged code points (see above):
        U+27E8...U+27E9 MATHEMATICAL left/right ANGLE BRACKET
        (substitutes for U+2329...U+232A, see above)

    - All non-spacing versions of the contained diacritical marks (from
      U+0300...U+036F), thus all letters resulting of a combining of a base latin
      letter with one or more of this diacritical marks are implicitly contained in
      this set. This concerns e.g. all letters for Pinyin transcription of Chinese,

    - All Latin letters required by languages written in Latin which are official
      main languages of any country.
        Vietnamese (U+01A0/U+01A1, U+01AF/U+01B0, U+0309, U+031B, U+1EA0...U+1EF9)

    - All latin letters required by other languages written in Latin which
      have official status and a standardized orthography in any country
      ("minority languages"). This set is open, as the information may not be gained
      completely in due time, and as languages may newly fulfill the requirements
      in the future. (Thus, selections of languages can be made at first time,
      without excluding or discrimimating other languages which are to be considered
      in a future version of the standard.)
      This list may e.g. contain:
        Algonquin (U+0222/U+0223), Lakota (U+0220/U+019E), ...
        African (several letters from Latin Extended-B U+0180...U+024F)

    - All letters required for standardized transliterations of:
        Cyrillic (ISO 9), Hebrew, Arabic (ISO 233, DIN 31635),
        Indic scripts (ISO 15919), ...
      (in fact, this results in a very moderate set of diacritical marks and
       special letters).

    - ZWNJ (U+200C ZERO WITH NON-JOINER), as required to prevent ligatures
        (e.g. in German "Schilfinsel", no "fi" ligature is allowed)

        (e.g. for use within abbreviations or for punctuation spacing).

    - Currency symbols compatible with Latin script (U+20A1, U+20A2, U+20A6,
        U+20A9...U+20AB, U+20AD, U+20AE, U+20B0...U+20B2, U+20B4, U+20B5,
        U+0E3F, U+2133)

    - Symbols often found in running text, but missing in collection 282:
        U+0253 SWUNG DASH
        U+2197 NORTH EAST ARROW (trend symbol; external reference symbol)
        U+2198 SOUTH EAST ARROW (trend symbol)
        U+21B5 DOWNWARDS ARROW WITH CORNER LEFTWARDS (symbol for actuation of
          the "enter" key symbol in user guides)
        U+2205 EMPTY SET
        U+226A MUCH LESS THAN
        U+2300 DIAMETER
        U+2423 OPEN BOX (symbol for the "space" key in user guides;
          space marker in forms and pedagogical texts)
        U+2639 WHITE FROWNING FACE (complementary to the smilies contained in
          collection 282)
        U+266F MUSIC SHARP SIGN (for music title writing like "Symphony in F♯")
        U+266D MUSIC FLAT SIGN
        U+26A0 WARNING SIGN
        U+2713 CHECK MARK
        U+2717 BALLOT X

    - Letters and symbols the use or standardization of which evolved only recently:
        U+203B REFERENCE MARK (e.g. used in the printed Unicode standard)
        U+203D INTERROBANG (used e.g. in several newer Microsoft fonts)
        U+2E18 INVERTED INTERROBANG (complementary to Interrobang for Spanish)
        U+2120 SERVICE MARK
        U+214D AKTIESELSKAB (common in Norwegian company adresses)

    - Typographically well defined alternatives to ASCII symbols (which usually
      compromise their design for their different applications):
        U+2215 DIVISION SLASH
        U+2217 STAR OPERATOR (asterisk alternative with well-defined x-height)
        U+223C TILDE OPERATOR (design corresponding to U+2248 ALMOST EQUAL TO)
        U+2036 REVERSED DOUBLE PRIME (complementary to U+2033 DOUBLE PRIME, for
          typographically satisfying representation of ``such vernacular quotes´´)
        U+2035 REVERSE PRIME (for typographically satisfying representation of
          vernacular use of U+0060 as "false apostrophe" to mark a word suffix
          boundary, which is often seen in Germany although by no means supported by
          official orthography.)

    - Some bullets to complement the set contained in collection 281 orthogonally:
        U+25A1 WHITE SQUARE
          (typographically "decent" alternative to the full black square)
        U+25C6 BLACK DIAMOND
        U+25C7 WHITE DIAMOND
        U+25C9 FISHEYE (white circle containing black small circle)
        U+25E6 WHITE BULLET (complementary to U+2022 BULLET)

    - Karl Pentzlin

    This archive was generated by hypermail 2.1.5 : Wed Dec 19 2007 - 13:49:12 CST