ISO/IEC 10646 subsets

From: John Cowan (cowan@locke.ccil.org)
Date: Mon Dec 14 1998 - 10:29:29 EST


Erik van der Poel wrote:

> *-iso10646-subsets:300,301,330,640,641
>
> (Just an example. I don't know if these are real ISO 10646 subsets IDs.)

Actually, there are. Here's a list drawn from
http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/n1785.doc :

1 BASIC LATIN 0020 - 007E
2 LATIN-1 SUPPLEMENT 00A0 - 00FF
3 LATIN EXTENDED-A 0100 - 017F
4 LATIN EXTENDED-B 0180 - 024F
5 IPA EXTENSIONS 0250 - 02AF
6 SPACING MODIFIER LETTERS 02B0 - 02FF
7 COMBINING DIACRITICAL MARKS 0300 - 036F
8 BASIC GREEK 0370 - 03CF
9 GREEK SYMBOLS AND COPTIC 03D0 - 03FF
10 CYRILLIC 0400 - 04FF
11 ARMENIAN 0530 - 058F
12 BASIC HEBREW 05D0 - 05EA
13 HEBREW EXTENDED 0590 - 05CF 05EB - 05FF
14 BASIC ARABIC 0600 - 065F
15 ARABIC EXTENDED 0660 - 06FF
16 DEVANAGARI 0900 - 097F 200C, 200D
17 BENGALI 0980 - 09FF 200C, 200D
18 GURMUKHI 0A00 - 0A7F 200C, 200D
19 GUJARATI 0A80 - 0AFF 200C, 200D
20 ORIYA 0B00 - 0B7F 200C, 200D
21 TAMIL 0B80 - 0BFF 200C, 200D
22 TELUGU 0C00 - 0C7F 200C, 200D
23 KANNADA 0C80 - 0CFF 200C, 200D
24 MALAYALAM 0D00 - 0D7F 200C, 200D
25 THAI 0E00 - 0E7F
26 LAO 0E80 - 0EFF
27 BASIC GEORGIAN 10D0 - 10FF
28 GEORGIAN EXTENDED 10A0 - 10CF
29 HANGUL JAMO 1100 - 11FF
30 LATIN EXTENDED ADDITIONAL 1E00 - 1EFF
31 GREEK EXTENDED 1F00 - 1FFF
32 GENERAL PUNCTUATION 2000 - 206F
33 SUPERSCRIPTS AND SUBSCRIPTS 2070 - 209F
34 CURRENCY SYMBOLS 20A0 - 20CF
35 COMBINING DIACRITICAL MARKS FOR SYMBOLS 20D0 - 20FF
36 LETTERLIKE SYMBOLS 2100 - 214F
37 NUMBER FORMS 2150 - 218F
38 ARROWS 2190 - 21FF
39 MATHEMATICAL OPERATORS 2200 - 22FF
40 MISCELLANEOUS TECHNICAL 2300 - 23FF
41 CONTROL PICTURES 2400 - 243F
42 OPTICAL CHARACTER RECOGNITION 2440 - 245F
43 ENCLOSED ALPHANUMERICS 2460 - 24FF
44 BOX DRAWING 2500 - 257F
45 BLOCK ELEMENTS 2580 - 259F
46 GEOMETRIC SHAPES 25A0 - 25FF
47 MISCELLANEOUS SYMBOLS 2600 - 26FF
48 DINGBATS 2700 - 27BF
49 CJK SYMBOLS AND PUNCTUATION 3000 - 303F
50 HIRAGANA 3040 - 309F
51 KATAKANA 30A0 - 30FF
52 BOPOMOFO 3100 - 312F
53 HANGUL COMPATIBILITY JAMO 3130 - 318F
54 CJK MISCELLANEOUS 3190 - 319F
55 ENCLOSED CJK LETTERS AND MONTHS 3200 - 32FF
56 CJK COMPATIBILITY 3300 - 33FF
60 CJK UNIFIED IDEOGRAPHS 4E00 - 9FFF
61 PRIVATE USE AREA E000 - F8FF
62 CJK COMPATIBILITY IDEOGRAPHS F900 - FAFF
63 ALPHABETIC PRESENTATION FORMS FB00 - FB4F
64 ARABIC PRESENTATION FORMS-A FB50 - FDFF
65 COMBINING HALF MARKS FE20 - FE2F
66 CJK COMPATIBILITY FORMS FE30 - FE4F
67 SMALL FORM VARIANTS FE50 - FE6F
68 ARABIC PRESENTATION FORMS-B FE70 - FEFE
69 HALFWIDTH AND FULLWIDTH FORMS FF00 - FFEF
70 SPECIALS FFF0 - FFFD
71 HANGUL SYLLABLES AC00 - D7A3
72 BASIC TIBETAN 0F00 - 0FBF

The following collections specify characters used for alternate formats and script-specific formats. See annex D for more information.

200 ZERO-WIDTH BOUNDARY INDICATORS 200B - 200D FEFF
201 FORMAT SEPARATORS 2028 - 2029
202 BI-DIRECTIONAL FORMAT MARKS 200E - 200F
203 BI-DIRECTIONAL FORMAT EMBEDDINGS 202A - 202E
204 HANGUL FILL CHARACTERS 3164, FFA0
205 CHARACTER SHAPING SELECTORS 206A - 206D
206 NUMERIC SHAPE SELECTORS 206E - 206F

The following specify collections which are the union of particular collections defined above.

250 GENERAL FORMAT CHARACTERS Collections 200 - 203
251 SCRIPT-SPECIFIC FORMAT CHARACTERS Collections 204 - 206

The following specify other collections.

270 COMBINING CHARACTERS [combining characters]
271 COMBINING CHARACTERS B-2 [combining chars, except Indic vowels]
299 BMP FIRST EDITION [Unicode 1.1 exactly]
300 BMP 0000 - D7FF E000 - FFFD
301 BMP-AMD.7 [Unicode 2.0 exactly]
400 PRIVATE USE PLANES G=00, P=0F, 10 & E0 - FF
500 PRIVATE USE GROUPS G=60 - 7F

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT