Script summary: status of ISO/IEC 10646 BMP, February 1998 - John Clews
For information, as ISO/IEC JTC1/SC2/WG2 will meet in Seattle from
16-20 March 1998, to discuss progess in developing UCS (ISO/IEC 10646
and Unicode), I am circulating the attached to provide a simple
overview of which scripts the UCS BMP accomodates. Sections (a) and
(b) provide this as text in tabular form, (c) provides backup
information on decisions taken by the Unicode Consortium and ISO/IEC
JTC1/SC2/WG2, and (d) provides Unicode web site information to enable
readers to follow up this information in more detail. It is being
circulated to about four email lists: my apologies if you see this
more than once.
CONTENTS
(a) Summary of row allocation within ISO/IEC 10646:
(b) Detailed row allocation within ISO/IEC 10646, with Key
(c) Details of proposed additions/amendments to UCS (ISO/IEC 10646
and Unicode) based on details in the Unicode web site.
(d) URL References from this document
Note: Scripts outside the BMP are not commented upon in (a) and (b).
Note: The charts below are best viewed or printed in a fixed pitch
font (if using MS-Windows, Courier New at 12 point or smaller
is suggested, to avoid any problems with misleading wordwrap.
Note: Please notify any errors/queries to 10646er@sesame.demon.co.uk
Scripts in the Basic Multilingual Plane (BMP) of UCS (ISO/IEC 10646
and Unicode) are grouped in a broadly West through East order
(European scripts, West Asian scripts, North African scripts, South
Asian scripts, Southeast Asian scripts and East Asian scripts), in
the Basic Multilingual Plane (BMP) which provides for the needs of
most users.
In addition, various additional scripts from all the above
catagories, together with some North American scripts, have been
grouped together (see Miscellaneous scripts listed in 2. below) and
there are also a large number of symbols and special use characters.
* * * * * * * *
(a) Summary of row allocation within ISO/IEC 10646:
00-05 EUROPEAN SCRIPTS (see also 10, 1C, 1E-1F and FB)
05-08 WEST ASIAN/NORTH AFRICAN SCRIPTS (see also 12-13; FB-FE)
09-10 SOUTH ASIAN AND SOUTHEAST ASIAN SCRIPTS (See also 17-1C)
10-1F MISCELLANEOUS SCRIPTS
20-27 SYMBOLS
2F-D7 EAST ASIAN SCRIPTS (see also hexadecimal row 11 above)
D8-FF SPECIAL USES (SWAP ZONE, PRIVATE USE, VARIANT FORMS)
TBD; Under investigation; (no coding allocated at time of writing)
Mongolian; Khmer; and Burmese.
* * * * * * * *
(b) Detailed row allocation within ISO/IEC 10646, with Key
.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
00 xxx xxx Basic Latin ... ... ... xxx xxx Latin-1 Supplement ...
01 Latin Extended-A... ... ... ... Latin Extended-B... ... ... ...
02 ... ... ... ... ... IPA Extensions ... ... Spacing modifiers..
03 Combining diacritics .. ... Basic Greek ... ... ... Coptic ...
04 Cyrillic... ... ... ... ... ... ... ... ... ... ... ... ... ...
05 xxx xxx xxx Armenian .. ... ... ... Hebrew (Basic and Extended)
06 xxx xxx Basic Arabic... Extended Arabic ... ... ... ... ... ...
07 [ Maldivian; Syriac; Phoenician; Samaritan; Aramaic ]
08 [ Pahlavi; Tifinagh ]
09 Devanagari ... ... ... ... ... Bengali ... ... ... ... ... ...
0A Gurmukhi .. ... ... ... ... ... Gujarati... ... ... ... ... ...
0B Oriya.. ... ... ... ... ... ... Tamil ... ... ... ... ... ...
0C Telugu ... ... ... ... ... ... Kannada ... ... ... ... ... ...
0D Malayalam.. ... ... ... ... ... Sinhala ... ... ... ... ... ...
0E Thai... ... ... ... ... ... ... Lao ... ... ... ... ... ... ...
0F Basic Tibetan.. ... ... ... ... ... ... ... Extended Tibetan...
.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
10 Mongolian.. ... ... ... ... ... ... ... Georgian[C] Georgian...
11 Hangul Jamo ... ... ... ... ... ... ... ... ... ... ... ... ...
12 Ethiopic script ... ... ... ... ... ... ... ... ... ... ... ...
13 ... ... ... ... ... ... ... ... xxx xxx Cherokee... ... ... ...
14 Canadian Aboriginal Syllabics . ... ... ... ... ... ... ... ...
15 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16 ... ... ... ... ... ... ... ... Ogham.. Runic.. ... ... ... ...
17 [ Burmese Khmer ]
18 [ Dai Tai Lue (Chiang Mai) Tai Nua (Tai Mau) ]
19 [ Cham; 'Phags-pa ]
1A [ Kirat (Limbu); Siddham; Meitei (Manipuri) ]
1B [ Javanese; Batak; Buginese; Lisu; Karenni (Kayah Li) ]
1C [ Glagolitic; Lepcha ]
1D xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx
1E Latin extended additional.. ... ... ... ... ... ... ... ... ...
1F Greek extended ... ... ... ... ... ... ... ... ... ... ... ...
.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
20 General Punctuation ... ... Subscripts Currency .. Comb. Symb.
21 Letter-Symbols ... Number forms... Arrows ... ... ... ... ...
22 Mathematical operators ... ... ... ... ... ... ... ... ... ...
23 Miscellaneous Technical ... ... ... ... ... ... ... ... ... ...
24 Control pics... OCR ... Enclosed alphanumerics ... ... ... ...
25 Box drawing ... ... ... ... ... Blocks. Geometric shapes... ...
26 Miscellaneous symbols ... ... ... ... ... ... ... ... ... ...
27 Dingbats... ... ... ... ... ... ... ... ... ... ... ... ... ...
28 Braille Symbols ... ... ... ... ... ... ... ... ... ... ... ...
29-2E xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx
2F Kang Xi Radicals... ... ... ... ... ... ... ... ... ... xxx xxx
.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
30 CJK Symbols ... Hiragana... ... ... ... Katakana... ... ... ...
31 Bopomofo... Hangul compatible Jamo. CJK xxx xxx xxx xxx xxx xxx
32 Enclosed CJK letters and months ... ... ... ... ... ... ... ...
33 CJK compatibility.. ... ... ... ... ... ... ... ... ... ... ...
34-4D CJK Ideographic Extension A ... ... ... ... ... ... ... ... ...
4E-9F CJK unified ideographs. ... ... ... ... ... ... ... ... ... ...
A0-A3 Yi. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
A4 ... ... ... ... ... ... ... ... ... Yi radicals ... xxx xxx xxx
A5-AB xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx
AC-D7 Hangul extended: replaces 34-4D (Hangul in former edition). ...
.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. 8.. 9.. A.. B.. C.. D.. E.. F..
D8-DB High-half zone of UTF-16 (Swap zone)... ... ... ... ... ... ...
DC-DF Low-half zone of UTF-16 (Swap zone) ... ... ... ... ... ... ...
E0-F8 PRIVATE USE AREA... ... ... ... ... ... ... ... ... ... ... ...
F9-FF PRESENTATION FORMS
F9-FA CJK compatibility ideographs... ... ... ... ... ... ... ... ...
FB Alpha presentation. Arabic presentation forms - A.. ... ... ...
FC-FD ... ... ... ... ... Arabic presentation forms - A.. ... ... ...
FE xxx xxx 1/2.CJK ... Arabic presentation forms - B.. ... ... ...
FF Halfwidth and fullwidth forms.. ... ... ... ... ... ... ... ***
Key:
xxx = unallocated
[ ] = Some discussion has taken place in ISO/IEC/JTC1/SC2/WG2,
or in its "Roadmap" document, on placing these scripts in
these rows, but there is as yet no definite character
allocation either in the standard, or in resolutions of
JTC1/SC2/WG2. These provisional allocations could
thus be changed in the future. See also note TBD at the
end of section (b) and in section (c).
[C] = Capitals (Extended Georgian, or Georgian "Capitals")
*** = Specials
.. ... 0.. 1.. 2.. 3.. 4.. 5.. 6.. 7.. Indication of columns used
8.. 9.. A.. B.. C.. D.. E.. F.. Indication of columns used
-----------------------------------------------------------------
* * * * * * * *
(c) Details of proposed additions/amendments to UCS (ISO/IEC 10646
and Unicode) based on details in the Unicode web site. Note: As
permitted by the Unicode web site Copyright Agreement, the
following text is unmodified (although reformatted for email) as
found at URL: http://www.unicode.org/unicode/alloc/Pipeline.html
Sections (a) and (b) have been cross-checked with the information
below and with ISO/IEC 10646 tables.
PROPOSED NEW CHARACTERS -- PIPELINE TABLE
The following is a summary of the characters that the Unicode
Technical Committee has considered for inclusion in the next
version of the Unicode Standard (post-Unicode 2.0). It is
presented here to help implementors to track possible future
additions to the Standard.
Information is also available regarding Proposed New Scripts[7].
This page was last updated 5-Feb-1998.
Caution: use of proposed characters is at implementers' own risk;
the composition and allocation of the characters may change before
they are adopted in the Unicode Standard.
For more information about this, and the meaning of the ISO
Status field, see Caution[8].
For a summary of the remaining space for allocation in
Unicode 2.0, see Current Allocation[9].
-----------------------------------------------------------------
[KEY TO FOLLOWING ENTRIES]
Proposed [coding] Allocation;
Count; Name;
UTC [Date and] Status;
ISO [Date and] Status;
-----------------------------------------------------------------
01F6
1 LATIN CAPITAL LETTER HWAIR
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
01F7
1 LATIN CAPITAL LETTER WYNN
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
01F8,01F9
2 Pinyin Letters with Tone Marks (small and capital N
with grave)
96-Jun-06
Accepted
96-Aug-16
Stage 3
-----------------------------------------------------------------
0218..0219;
021A..021B
4 LATIN LETTER S WITH COMMA BELOW (CAPITAL and SMALL)
LATIN LETTER T WITH COMMA BELOW (CAPITAL and SMALL)
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
0400,040D,
0450,045D
4 Cyrillic, 4 letters with grave (Macedonian: small and
capital IE and II with grave)
97-May-29
Accepted
96-Aug-16
Stage 2
-----------------------------------------------------------------
058A
1 ARMENIAN HYPHEN
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
06FD
06BF
06FA
06FB
06FC
06B8
06B9
06FE
06CF
9 ARABIC SIGN SINDHI AMPERSAND
ARABIC LETTER TCHEH WITH DOT ABOVE
ARABIC LETTER SHEEN WITH DOT BELOW
ARABIC LETTER DAD WITH DOT BELOW
ARABIC LETTER GHAIN WITH DOT BELOW
ARABIC LETTER LAM WITH THREE DOTS BELOW
ARABIC LETTER NOON WITH DOT BELOW
ARABIC SIGN SINDHI POSTPOSITION MEN
ARABIC LETTER WAW WITH DOT ABOVE
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
0D80..0DFF
80 SINHALESE
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
1200..137F
346 Ethiopic[10]
96-Mar-06
Accepted
97-Ju1-04
Stage 5
-----------------------------------------------------------------
13A0..13FF
85 Cherokee[11]
96-Jun-06
Accepted
97-Ju1-04
Stage 4
-----------------------------------------------------------------
1400..167F
623 Canadian Syllabics
96-Mar-06
Accepted
97-Ju1-04
Stage 4
-----------------------------------------------------------------
1400..167F
8 Additional Canadian Syllabics
97-Dec-05
Accepted
N/A
-----------------------------------------------------------------
1680..169F
29 Ogham[12] (archaic Irish script)
96-Mar-97
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
16A0..16FF
81 Runic (archaic Nordic script)
96-Jun-06
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
20AC
1 EURO SIGN
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
237B
1 APL FUNCTIONAL SYMBOL QUAD
96-Mar-06
Accepted
96-Aug-16
Stage 3
-----------------------------------------------------------------
In 23xx block
12 Electrotechnical Symbols
96-Mar-97
Accepted
96-Apr-26
Stage 4
-----------------------------------------------------------------
2800..28FF
256 Braille Pattern Symbols[13]
96-Jun-06
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
2F00..2FD5
214 KangXi radicals
97-May-29
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
3038..303A
3 Hangzhou numerals
96-Sep-06
Accepted
N/A
-----------------------------------------------------------------
3400..4DFF
6585 CJK Unified Ideograph, Extension A
96-Sep-06
Accepted
96-Aug-16
Stage 3
-----------------------------------------------------------------
A000..A4C8
1165 Yi
96-Dec-06
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
A490..A4C8
57 Yi radicals
96-Dec-06
Accepted
97-Ju1-04
Stage 3
-----------------------------------------------------------------
FB1D
1 HEBREW YOD WITH HIRIQ
96-Jun-06
Accepted
96-Aug-16
Stage 1
-----------------------------------------------------------------
FFFC
1 OBJECT REPLACEMENT CHARACTER
96-Mar-06
Accepted
96-Apr-26
Stage 4
-----------------------------------------------------------------
TBD (surrogates)
102 Linear B[14]
97-May-29
Accepted
N/A
-----------------------------------------------------------------
TBD (surrogates)
55 Cypriot Syllabary[15]
97-May-29
Accepted
N/A
-----------------------------------------------------------------
TBD (surrogates)
29 Etruscan[16]
97-May-29
Accepted
N/A
-----------------------------------------------------------------
TBD (surrogates)
27 Gothic[17]
97-May-29
Accepted
N/A
-----------------------------------------------------------------
TBD (surrogates)
220 Greek Byzantine Musical Notation
96-Aug-05
Accepted
97-Ju1-04
Stage 2
-----------------------------------------------------------------
TBD (Surrogates)
76 Deseret Alphabet (phonetic English script)[18]
96-Dec-06
Accepted
Stage 1
-----------------------------------------------------------------
TBD (surrogates)
48 Shavian (phonetic English script)[19]
97-May-29
Accepted
N/A
-----------------------------------------------------------------
TBD (surrogates)
223 Western Musical Symbols[20]
97-Dec-05
Accepted
N/A
-----------------------------------------------------------------
Surrogates
Plane 14 tags
97-Dec-05
Accepted
N/A
-----------------------------------------------------------------
TBD
2 Combining Enclosing Screen and Combining Enclosing Keycap
97-Dec-05
Accepted
N/A
-----------------------------------------------------------------
TBD Mongolian
Under investigation
N/A
-----------------------------------------------------------------
N/A
14 Yoruba precomposed
96-Sep-07
Rejected
Stage 1
-----------------------------------------------------------------
N/A
15 Armenian Punctuation (one, ARMENIAN HYPHEN, accepted by WG2)
96-Mar-06
Rejected
97-Ju1-04
Stage 2
-----------------------------------------------------------------
N/A
N/A Supplemental Arabic for Uighur, Kazakh, and Kirghiz
96-Dec-06
Rejected
N/A
-----------------------------------------------------------------
N/A
45 Phaistos Disk Script[21]
97-May-29
Not accepted
N/A
-----------------------------------------------------------------
N/A
95 Pollard[22]
97-May-29
Comments requested
N/A
-----------------------------------------------------------------
N/A
1 Mid-level hamzah
97-Jul-22
Withdrawn
N/A
-----------------------------------------------------------------
N/A
1 MODIFIER LETTER MIDDLE DOT
97-Dec-05
Withdrawn
N/A
-----------------------------------------------------------------
N/A
N/A Khmer
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Burmese[23]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Klingon[24]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Cirth[25]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Tengwar[26]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Ugaritic Cuneiform[27]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Old Persian Cuneiform[28]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Meroitic[29]
Under investigation
N/A
-----------------------------------------------------------------
N/A
N/A Basic Egyptian Hieroglyphics[30]
Under investigation
N/A
-----------------------------------------------------------------
* * * * * * * *
(d) URL References from this document
[orig] http://www.unicode.org/unicode/alloc/Pipeline.html
[1] http://www.unicode.org/unicode/contents.html
[2] http://www.unicode.org/unicode/standard/standard.html
[3] http://www.unicode.org/unicode/uni2errata/UnicodeErrata.html
[4] http://www.unicode.org/unicode/techwork.html
[5] http://www.unicode.org/unicode/onlinedat/online.html
[6] http://www.unicode.org/unicode/conf.html
[7] http://www.unicode.org/pending/pending.html
[8] http://www.unicode.org/unicode/alloc/Caution.html
[9] http://www.unicode.org/unicode/alloc/CurrentAllocation.html
[10] http://www.unicode.org/pending/ethiopic/ethiopic.html
[11] http://www.unicode.org/pending/cherokee/cherokee.html
[12] http://www.unicode.org/pending/ogham/ogham.html
[13] http://www.unicode.org/pending/braille/braille.html
[14] http://www.unicode.org/pending/linearb/LinearB.pdf
[15] http://www.unicode.org/pending/cypriot/Cypriot.pdf
[16] http://www.unicode.org/pending/etruscan/Etruscan.pdf
[17] http://www.unicode.org/pending/gothic/Gothic.pdf
[18] http://www.unicode.org/pending/deseret/Deseret.html
[19] http://www.unicode.org/pending/shavian/shavian.html
[20] http://www.lib.virginia.edu/dmmc/Music/UnicodeMusic/
[21] http://www.unicode.org/pending/phaistos/Phaistos.pdf
[22] http://www.unicode.org/pending/pollard/Pollard.pdf
[23] http://www.unicode.org/pending/burmese/Burmese.html
[24] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1643/n1643.htm
[25] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1642/n1642.htm
[26] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1641/n1641.htm
[27] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1640/n1640.htm
[28] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1639/n1639.htm
[29] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1638/n1638.htm
[30] http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1637/n1637.htm
[31] http://www.unicode.org/index.html
[32] http://www.unicode.org/unicode/copyright.html
[33] mailto:info@unicode.org
END
-- John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG Email: 10646er@sesame.demon.co.uk; tel: +44 (0) 1423 888 432 Chairman of ISO/TC46/SC2: Conversion of Written Languages; Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Member of CEN/TC304: Character Set Technology; Member of ISO/IEC/JTC1/SC2: Character Sets.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT