L2/03-407R Source: Ken Lunde, Adobe Systems Incorporated Status: Expert Contribution Date: 2003-10-28 Action: For consideration by UTC and IRG In reponse to L2/03-342, "UTC Consensus on a Basic International CJK Subset," attached is my proposal. Put simply, a basic set of 7,772 ideographs can be compiled by combining the basic character sets from the primary locales in which they are used, specifically GB 2312-80 Level 1 (Mainland China), Big Five Level 1 (Taiwan), JIS X 0208:1997 Level 1 (Japan), and KS X 1001:1992 (Korea). Note that CNS 11643-1992 Plane 1 is functionally identical to Big Five Level 1. When one considers the most frequently used ideographs in each locale, the basic "level" in each of these character sets was defined to include the most basic ideographs. In other words, the work has been done years ago, and simply needed to be consolidated. For example, consider the following information: 1) The 3,755 hanzi in GB 2312-80 Level 1 includes 2,495 of the 2,500 Chanyong Hanzi. Four of the missing five hanzi are included in this subset definition by virtue of the fact that Big Five Level 1 includes them. Only one, U+7B5D, is missing, and has been added for completeness. 2) The 5,411 hanzi in Big Five Level 1, by definition, include the 4,808 Changyong Hanzi. 3) The 2,966 kanji in JIS X 0208:1997 Level 1, by definition, include the 1,945 Joyo Kanji. 4) The 4,888 hanja in KS X 1001:1992, by definition, include the 1,800 Sangyong Hanja. Note the character codes in the GB, Big5, JIS, and KS columns represent only the basic levels in the respective standards. These four sets do not address HKSAR. By increasing the total of characters to 8,000, room for addition of high-frequency HKSAR characters would be available. However, such an addition should be contemplated only if it can be made in a timely fashion.