Script summary: status of ISO/IEC 10646 BMP, February 1998 - John Clews

From: John Clews (10646er@sesame.demon.co.uk)
Date: Mon Feb 16 1998 - 12:24:50 EST

Next message: John Cowan: "Re: Subject: Windows 95 clipboard & Unicode"
Previous message: Jeroen Hellingman: "Indian Scripts and Unicode"
Next in thread: Asmus Freytag: "Re: Script summary: status of ISO/IEC 10646 BMP, February 1998 - John Clews"
Maybe reply: Asmus Freytag: "Re: Script summary: status of ISO/IEC 10646 BMP, February 1998 - John Clews"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

For information, as ISO/IEC JTC1/SC2/WG2 will meet in Seattle from
16-20 March 1998, to discuss progess in developing UCS (ISO/IEC 10646
and Unicode), I am circulating the attached to provide a simple
overview of which scripts the UCS BMP accomodates. Sections (a) and
(b) provide this as text in tabular form, (c) provides backup
information on decisions taken by the Unicode Consortium and ISO/IEC
JTC1/SC2/WG2, and (d) provides Unicode web site information to enable
readers to follow up this information in more detail. It is being
circulated to about four email lists: my apologies if you see this
more than once.

CONTENTS

(a) Summary of row allocation within ISO/IEC 10646:
(b) Detailed row allocation within ISO/IEC 10646, with Key
(c) Details of proposed additions/amendments to UCS (ISO/IEC 10646
and Unicode) based on details in the Unicode web site.
(d) URL References from this document

Note: Scripts outside the BMP are not commented upon in (a) and (b).
Note: The charts below are best viewed or printed in a fixed pitch
font (if using MS-Windows, Courier New at 12 point or smaller
is suggested, to avoid any problems with misleading wordwrap.
Note: Please notify any errors/queries to 10646er@sesame.demon.co.uk

Scripts in the Basic Multilingual Plane (BMP) of UCS (ISO/IEC 10646
and Unicode) are grouped in a broadly West through East order
(European scripts, West Asian scripts, North African scripts, South
Asian scripts, Southeast Asian scripts and East Asian scripts), in
the Basic Multilingual Plane (BMP) which provides for the needs of
most users.

In addition, various additional scripts from all the above
catagories, together with some North American scripts, have been
grouped together (see Miscellaneous scripts listed in 2. below) and
there are also a large number of symbols and special use characters.

* * * * * * * *

(a) Summary of row allocation within ISO/IEC 10646:

00-05 EUROPEAN SCRIPTS (see also 10, 1C, 1E-1F and FB)
05-08 WEST ASIAN/NORTH AFRICAN SCRIPTS (see also 12-13; FB-FE)
09-10 SOUTH ASIAN AND SOUTHEAST ASIAN SCRIPTS (See also 17-1C)
10-1F MISCELLANEOUS SCRIPTS
20-27 SYMBOLS
2F-D7 EAST ASIAN SCRIPTS (see also hexadecimal row 11 above)
D8-FF SPECIAL USES (SWAP ZONE, PRIVATE USE, VARIANT FORMS)

TBD; Under investigation; (no coding allocated at time of writing)
Mongolian; Khmer; and Burmese.

* * * * * * * *

(b) Detailed row allocation within ISO/IEC 10646, with Key

.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
00 xxx xxx Basic Latin ... ... ... xxx xxx Latin-1 Supplement ...
01 Latin Extended-A... ... ... ... Latin Extended-B... ... ... ...
02 ... ... ... ... ... IPA Extensions ... ... Spacing modifiers..
03 Combining diacritics .. ... Basic Greek ... ... ... Coptic ...
04 Cyrillic... ... ... ... ... ... ... ... ... ... ... ... ... ...
05 xxx xxx xxx Armenian .. ... ... ... Hebrew (Basic and Extended)
06 xxx xxx Basic Arabic... Extended Arabic ... ... ... ... ... ...
07 [ Maldivian; Syriac; Phoenician; Samaritan; Aramaic ]
08 [ Pahlavi; Tifinagh ]
09 Devanagari ... ... ... ... ... Bengali ... ... ... ... ... ...
0A Gurmukhi .. ... ... ... ... ... Gujarati... ... ... ... ... ...
0B Oriya.. ... ... ... ... ... ... Tamil ... ... ... ... ... ...
0C Telugu ... ... ... ... ... ... Kannada ... ... ... ... ... ...
0D Malayalam.. ... ... ... ... ... Sinhala ... ... ... ... ... ...
0E Thai... ... ... ... ... ... ... Lao ... ... ... ... ... ... ...
0F Basic Tibetan.. ... ... ... ... ... ... ... Extended Tibetan...

.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
10 Mongolian.. ... ... ... ... ... ... ... Georgian[C] Georgian...
11 Hangul Jamo ... ... ... ... ... ... ... ... ... ... ... ... ...
12 Ethiopic script ... ... ... ... ... ... ... ... ... ... ... ...
13 ... ... ... ... ... ... ... ... xxx xxx Cherokee... ... ... ...
14 Canadian Aboriginal Syllabics . ... ... ... ... ... ... ... ...
15 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16 ... ... ... ... ... ... ... ... Ogham.. Runic.. ... ... ... ...
17 [ Burmese Khmer ]
18 [ Dai Tai Lue (Chiang Mai) Tai Nua (Tai Mau) ]
19 [ Cham; 'Phags-pa ]
1A [ Kirat (Limbu); Siddham; Meitei (Manipuri) ]
1B [ Javanese; Batak; Buginese; Lisu; Karenni (Kayah Li) ]
1C [ Glagolitic; Lepcha ]
1D xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx
1E Latin extended additional.. ... ... ... ... ... ... ... ... ...
1F Greek extended ... ... ... ... ... ... ... ... ... ... ... ...

.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
20 General Punctuation ... ... Subscripts Currency .. Comb. Symb.
21 Letter-Symbols ... Number forms... Arrows ... ... ... ... ...
22 Mathematical operators ... ... ... ... ... ... ... ... ... ...
23 Miscellaneous Technical ... ... ... ... ... ... ... ... ... ...
24 Control pics... OCR ... Enclosed alphanumerics ... ... ... ...
25 Box drawing ... ... ... ... ... Blocks. Geometric shapes... ...
26 Miscellaneous symbols ... ... ... ... ... ... ... ... ... ...
27 Dingbats... ... ... ... ... ... ... ... ... ... ... ... ... ...
28 Braille Symbols ... ... ... ... ... ... ... ... ... ... ... ...
29-2E xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx
2F Kang Xi Radicals... ... ... ... ... ... ... ... ... ... xxx xxx

.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
30 CJK Symbols ... Hiragana... ... ... ... Katakana... ... ... ...
31 Bopomofo... Hangul compatible Jamo. CJK xxx xxx xxx xxx xxx xxx
32 Enclosed CJK letters and months ... ... ... ... ... ... ... ...
33 CJK compatibility.. ... ... ... ... ... ... ... ... ... ... ...
34-4D CJK Ideographic Extension A ... ... ... ... ... ... ... ... ...
4E-9F CJK unified ideographs. ... ... ... ... ... ... ... ... ... ...
A0-A3 Yi. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
A4 ... ... ... ... ... ... ... ... ... Yi radicals ... xxx xxx xxx
A5-AB xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx
AC-D7 Hangul extended: replaces 34-4D (Hangul in former edition). ...

.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
D8-DB High-half zone of UTF-16 (Swap zone)... ... ... ... ... ... ...
DC-DF Low-half zone of UTF-16 (Swap zone) ... ... ... ... ... ... ...
E0-F8 PRIVATE USE AREA... ... ... ... ... ... ... ... ... ... ... ...
F9-FF PRESENTATION FORMS
F9-FA CJK compatibility ideographs... ... ... ... ... ... ... ... ...
FB Alpha presentation. Arabic presentation forms - A.. ... ... ...
FC-FD ... ... ... ... ... Arabic presentation forms - A.. ... ... ...
FE xxx xxx 1/2.CJK ... Arabic presentation forms - B.. ... ... ...
FF Halfwidth and fullwidth forms.. ... ... ... ... ... ... ... ***

Key:

xxx = unallocated

     [ ] = Some discussion has taken place in ISO/IEC/JTC1/SC2/WG2,
           or in its "Roadmap" document, on placing these scripts in
           these rows, but there is as yet no definite character
           allocation either in the standard, or in resolutions of
           JTC1/SC2/WG2. These provisional allocations could
           thus be changed in the future. See also note TBD at the
           end of section (b) and in section (c).

[C] = Capitals (Extended Georgian, or Georgian "Capitals")

*** = Specials

.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. Indication of columns used
       8.. 9.. A.. B.. C.. D.. E.. F.. Indication of columns used
-----------------------------------------------------------------
        * * * * * * * *

(c) Details of proposed additions/amendments to UCS (ISO/IEC 10646
    and Unicode) based on details in the Unicode web site. Note: As
    permitted by the Unicode web site Copyright Agreement, the
    following text is unmodified (although reformatted for email) as
    found at URL: http://www.unicode.org/unicode/alloc/Pipeline.html

Sections (a) and (b) have been cross-checked with the information
below and with ISO/IEC 10646 tables.

PROPOSED NEW CHARACTERS -- PIPELINE TABLE

   The following is a summary of the characters that the Unicode
   Technical Committee has considered for inclusion in the next
   version of the Unicode Standard (post-Unicode 2.0). It is
   presented here to help implementors to track possible future
   additions to the Standard.

   Information is also available regarding Proposed New Scripts[7].

   This page was last updated 5-Feb-1998.

   Caution: use of proposed characters is at implementers' own risk;
   the composition and allocation of the characters may change before
   they are adopted in the Unicode Standard.

      For more information about this, and the meaning of the ISO
      Status field, see Caution[8].

      For a summary of the remaining space for allocation in
      Unicode 2.0, see Current Allocation[9].

-----------------------------------------------------------------
[KEY TO FOLLOWING ENTRIES]
Proposed [coding] Allocation;
Count; Name;
UTC [Date and] Status;

ISO [Date and] Status;

-----------------------------------------------------------------
01F6
   1 LATIN CAPITAL LETTER HWAIR
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
01F7
   1 LATIN CAPITAL LETTER WYNN
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
01F8,01F9
   2 Pinyin Letters with Tone Marks (small and capital N
          with grave)
   96-Jun-06
        Accepted
   96-Aug-16
        Stage 3

-----------------------------------------------------------------
0218..0219;
021A..021B
   4 LATIN LETTER S WITH COMMA BELOW (CAPITAL and SMALL)
          LATIN LETTER T WITH COMMA BELOW (CAPITAL and SMALL)
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
0400,040D,
0450,045D
   4 Cyrillic, 4 letters with grave (Macedonian: small and
          capital IE and II with grave)
   97-May-29
        Accepted
   96-Aug-16
        Stage 2

-----------------------------------------------------------------
058A
   1 ARMENIAN HYPHEN
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
06FD
06BF
06FA
06FB
06FC
06B8
06B9
06FE
06CF
   9 ARABIC SIGN SINDHI AMPERSAND
        ARABIC LETTER TCHEH WITH DOT ABOVE
        ARABIC LETTER SHEEN WITH DOT BELOW
        ARABIC LETTER DAD WITH DOT BELOW
        ARABIC LETTER GHAIN WITH DOT BELOW
        ARABIC LETTER LAM WITH THREE DOTS BELOW
        ARABIC LETTER NOON WITH DOT BELOW
        ARABIC SIGN SINDHI POSTPOSITION MEN
        ARABIC LETTER WAW WITH DOT ABOVE
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
0D80..0DFF
   80 SINHALESE
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
1200..137F
   346 Ethiopic[10]
   96-Mar-06
        Accepted
   97-Ju1-04
        Stage 5

-----------------------------------------------------------------
13A0..13FF
   85 Cherokee[11]
   96-Jun-06
        Accepted
   97-Ju1-04
        Stage 4

-----------------------------------------------------------------
1400..167F
   623 Canadian Syllabics
   96-Mar-06
        Accepted
   97-Ju1-04
        Stage 4

-----------------------------------------------------------------
1400..167F
   8 Additional Canadian Syllabics
   97-Dec-05
        Accepted
   N/A

-----------------------------------------------------------------
1680..169F
   29 Ogham[12] (archaic Irish script)
   96-Mar-97
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
16A0..16FF
   81 Runic (archaic Nordic script)
   96-Jun-06
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
20AC
   1 EURO SIGN
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
237B
   1 APL FUNCTIONAL SYMBOL QUAD
   96-Mar-06
        Accepted
   96-Aug-16
        Stage 3

-----------------------------------------------------------------
In 23xx block
   12 Electrotechnical Symbols
   96-Mar-97
        Accepted
   96-Apr-26
        Stage 4

-----------------------------------------------------------------
2800..28FF
   256 Braille Pattern Symbols[13]
   96-Jun-06
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
2F00..2FD5
   214 KangXi radicals
   97-May-29
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
3038..303A
   3 Hangzhou numerals
   96-Sep-06
        Accepted
   N/A

-----------------------------------------------------------------
3400..4DFF
   6585 CJK Unified Ideograph, Extension A
   96-Sep-06
        Accepted
   96-Aug-16
        Stage 3

-----------------------------------------------------------------
A000..A4C8
   1165 Yi
   96-Dec-06
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
A490..A4C8
   57 Yi radicals
   96-Dec-06
        Accepted
   97-Ju1-04
        Stage 3

-----------------------------------------------------------------
FB1D
   1 HEBREW YOD WITH HIRIQ
   96-Jun-06
        Accepted
   96-Aug-16
        Stage 1

-----------------------------------------------------------------
FFFC
   1 OBJECT REPLACEMENT CHARACTER
   96-Mar-06
        Accepted
   96-Apr-26
        Stage 4

-----------------------------------------------------------------
TBD (surrogates)
   102 Linear B[14]
   97-May-29
        Accepted
   N/A

-----------------------------------------------------------------
TBD (surrogates)
   55 Cypriot Syllabary[15]
   97-May-29
        Accepted
   N/A

-----------------------------------------------------------------
TBD (surrogates)
   29 Etruscan[16]
   97-May-29
        Accepted
   N/A

-----------------------------------------------------------------
TBD (surrogates)
   27 Gothic[17]
   97-May-29
        Accepted
   N/A

-----------------------------------------------------------------
TBD (surrogates)
   220 Greek Byzantine Musical Notation
   96-Aug-05
        Accepted
   97-Ju1-04
        Stage 2

-----------------------------------------------------------------
TBD (Surrogates)
   76 Deseret Alphabet (phonetic English script)[18]
   96-Dec-06
        Accepted
   Stage 1

-----------------------------------------------------------------
TBD (surrogates)
   48 Shavian (phonetic English script)[19]
   97-May-29
        Accepted
   N/A

-----------------------------------------------------------------
TBD (surrogates)
   223 Western Musical Symbols[20]
   97-Dec-05
        Accepted
   N/A

-----------------------------------------------------------------
Surrogates
                        Plane 14 tags
   97-Dec-05
        Accepted
   N/A

-----------------------------------------------------------------
TBD
   2 Combining Enclosing Screen and Combining Enclosing Keycap
   97-Dec-05
        Accepted
   N/A

-----------------------------------------------------------------
TBD Mongolian
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   14 Yoruba precomposed
   96-Sep-07
        Rejected
   Stage 1

-----------------------------------------------------------------
N/A
   15 Armenian Punctuation (one, ARMENIAN HYPHEN, accepted by WG2)
   96-Mar-06
        Rejected
   97-Ju1-04
        Stage 2

-----------------------------------------------------------------
N/A
   N/A Supplemental Arabic for Uighur, Kazakh, and Kirghiz
   96-Dec-06
        Rejected
   N/A

-----------------------------------------------------------------
N/A
   45 Phaistos Disk Script[21]
   97-May-29
        Not accepted
   N/A

-----------------------------------------------------------------
N/A
   95 Pollard[22]
   97-May-29
        Comments requested
   N/A

-----------------------------------------------------------------
N/A
   1 Mid-level hamzah
   97-Jul-22
        Withdrawn
   N/A

-----------------------------------------------------------------
N/A
   1 MODIFIER LETTER MIDDLE DOT
   97-Dec-05
        Withdrawn
   N/A

-----------------------------------------------------------------
N/A
   N/A Khmer
        Under investigation
   N/A


-----------------------------------------------------------------
N/A
   N/A Burmese[23]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Klingon[24]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Cirth[25]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Tengwar[26]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Ugaritic Cuneiform[27]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Old Persian Cuneiform[28]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Meroitic[29]
        Under investigation
   N/A

-----------------------------------------------------------------
N/A
   N/A Basic Egyptian Hieroglyphics[30]
        Under investigation
   N/A

-----------------------------------------------------------------

* * * * * * * *

(d) URL References from this document

END

--
John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
Email: 10646er@sesame.demon.co.uk;  tel: +44 (0) 1423 888 432
Chairman of ISO/TC46/SC2: Conversion of Written Languages;
Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Member of CEN/TC304: Character Set Technology;
Member of ISO/IEC/JTC1/SC2: Character Sets.

Next message: John Cowan: "Re: Subject: Windows 95 clipboard & Unicode"
Previous message: Jeroen Hellingman: "Indian Scripts and Unicode"
Next in thread: Asmus Freytag: "Re: Script summary: status of ISO/IEC 10646 BMP, February 1998 - John Clews"
Maybe reply: Asmus Freytag: "Re: Script summary: status of ISO/IEC 10646 BMP, February 1998 - John Clews"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT