L2/07-164 Source: Toshiya Suzuki Contact: mpsuzuki@hiroshima-u.ac.jp Date: March 13, 2007 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Subject: Comment on PRI 98: IVD Adobe-Japan1 (1/3) Dear Sirs, I have several questions and comments on Unicode Public Review Issue #98: submission to Ideographic Variation Database "Adobe-Japan1". I split my comments in 3 parts. For first, 3 small questions and requests. * If there's any intended design of the mapping between CIDs & IVD sequence for specified Unicode codepoint, please describe it. In most case, the younger CID is assigned to younger IVS sequence, but there are several exceptions. For example, in the case of U+5272, IVS sequences are ordered as: VS17-13684, VS18-1474 and VS19-20086. * CID+19071 is used twice in sequences.txt: U+29FCE, U+29FD7. Is it intended duplication? * Some URLs in TechNote #5078 to refer other TechNotes are obsoleted (e.g. direct hrefs to #5014 (p. 1), #5094 (p. 5)), and some TechNotes are no longer available (#5031 (p. 5)). I wish if Adobe can re-upload #5031 onto archived TechNotes collection. Regards, mpsuzuki -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Subject: Comment on PRI 98: IVD Adobe-Japan1 (2/3) Dear Sirs, Next, my comment 1 & 2. Comment 1: codepoint coverage of Adobe-Japan1 ============================================= I think, if the codepoints in IVD Adobe-Japan1 fit into the coverage defined by ISO/IEC 10646 JTC1/SC2/WG2 N3091 "Request For Collection Identifiers For Japanese Subsets of ISO/IEC 10646" (proposed by Japan National Body, it was accepted and will be included in future ISO 10646 ammendment). In it, 5 subsets are defined based on JIS and vendor character set. http: //www.dkuug.dk/jtc1/sc2/wg2/docs/n3091.doc http: //www.dkuug.dk/jtc1/sc2/wg2/docs/n3091-A.zip http: //std.dkuug.dk/jtc1/sc2/wg2/docs/N3104.doc If the coverage of Adobe-Japan1 can be expressed by the simple combination of the subsets, it will be very helpful for information interchange in Japan. Adobe-Japan1 IVS includes following 10 characters whose codepoint is not included in N3091 subsets. CID+7641 U+28CDD CID+7655 U+3D4E CID+7670 U+25874 CID+7672 U+8346 CID+7673 U+28EF6 CID+7687 U+6805 CID+7825 U+21A1A CID+7834 U+67FA CID+7836 U+688E CID+7838 U+243D0 I wish if these CIDs are reconsidered to fit N3091 subsets, for easier information interchange in Japanese markets. Detailed analysis will be discussed in Comment 7. Comment 2: codepoints in CJK Compatibility Ideographs ===================================================== The codepoints of current IVD Adobe-Japan1 covers whole of JIS X 0208 ("Basic Japanese" of N3091) and whole of JIS X 0212 ("Japanese Ideographics Supplement" of N3091). But JIS X 0213 ("JIS2004 Ideographics Extension" of N3091) and common Japanese vendor extensions ("Common Japanese" of N3091) are covered partially. I think, partial coverage is slightly confusing. Being covered completely or not covered at all is clearer and might be easier to understand. It seems that the uncovered codepoints are of CJK Compatibility Ideographs. According to TechNote #5078, Adobe CMaps and cid2code.txt, Adobe-Japan1-6 is designed to cover JIS X 0213 and most vendor extended Shift-JIS variants (e.g. Microsoft codepage 932), but the most codepoints of CJK Compatibility Ideographs for JIS X 0213 compatibility and common Japanese vendor extension are not used in the proposal (in following, I call them "avoided codepoints"). I guess, the avoided codepoints are just "the out of scope" of IVD Adobe-Japan1 (in fact, Unicode Technical Report #37 is written for CJK Unified Ideographs, no mention about CJK Compatibility Ideographs), and IVD Adobe-Japan1 does not concern the availability of ideographs at the avoided codepoints. However, if we use IVD Adobe-Japan1 in ToUnicode mapping tables in PDF using Adobe-Japan1 CID font, it can cause a round-trip issue. For example, if I make a PDF from JIS X 0213 text, with Adobe CID font, and insert ToUnicode mapping tables including IVS of IVD Adobe-Japan1, the receiver of PDF file can retrieve JIS X 0208 (and/or 0212) text from the PDF, but cannot retrieve original JIS X 0213 text. If IVD for CJK Compatibility Ideographs should NOT be defined, I wish if IVD Adobe-Japan1 proposal includes NFD from JIS X 0213 to JIS X 0208 + 0212, to prevent the round-trip issue by wrong utilization of IVD Adobe-Japan1. I want JIS and IRG experts to give comments about the IVD for CJK Compatibility Ideographs. Regards, mpsuzuki -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Subject: Comment on PRI 98: IVD Adobe-Japan1 (3/3) Dear Sirs, This post is last, my comments 3-9. Except of comment 3 (interoperability with OpenType font), most comments on this post are rather for Adobe TechNote #5078 itself than for PRI 98. I think, the primal form definitions for each CIDs are expected to be clarified, even if the answer is simply "KozMinProVI-Light is the primal definition, it is provided as is" (the assertion is sufficiently helpful). Regards, mpsuzuki -- Adobe TechNote #5078 includes very helpful informations about the references of glyph design (which standard the glyph should be compliant to, and which document we should refer to). However, there are several mismatch between the roots of CIDs and the codepoint in IVD. In following, I describe in detail. I think there are 3 groups of CIDs in Adobe-Japan1. 1. CID whose glyph should be designed with reference to Japanese Industrial Standards (JIS). 2. CID whose glyph are designed to display existing "proprietary" system out of Adobe. 2-a. CIDs whose glyphs are shared by legacy PostScript systems, like, Morisawa OCF. ex. Adobe-Japan1-0. 2-b. CIDs for "vendor character set". ex. Microsoft Windows 3.1 J, Apple Kanji Talk 6, Fujitsu FM-R. 2-c. CIDs for character set not by JIS or software vendors (so no specific implementations nor platforms are assumed). ex. National Language Council, K-JIS, U-PRESS, etc. 3. CID whose glyph does not refer any existing documents. About group 1 (CID for JIS) =========================== Comment 3: OT-tag interoperability? ----------------------------------- Unfortunately, current TrueType font format specification cannot include the cmap table including IVS (>= 32bit). So, OpenType layout feature will be quite important to support IVS. In OpenType, Adobe had already introduced the feature tags to specify the ideographic variant by JIS revision: "jp78", "jp83", etc: http: //partners.adobe.com/public/developer/opentype/index_tag6.html#jp78 http: //partners.adobe.com/public/developer/opentype/index_tag6.html#jp83 http: //partners.adobe.com/public/developer/opentype/index_tag6.html#jp90 http: //partners.adobe.com/public/developer/opentype/index_tag6.html#hojo http: //partners.adobe.com/public/developer/opentype/index_tag7.html#nlck http: //partners.adobe.com/public/developer/opentype/index_tag6.html#jp04 For convenient interoperability between text with ideographic variant specification by "jpXX" OT-tag and that by IVS, I wish if Adobe defines the mapping table of which Adobe-Japan1 IVS should be used for these OT-tag. ("nlck" might be slightly different category, sorry) About group 2 (CID shared by legacy PS systems) =============================================== About 2-a. CIDs shared by legacy PS systems =========================================== Comment 4: Requirement of JIS90 compliancy ------------------------------------------ Now, Adobe TechNote #5078 notes about JIS90-compliancy, as: "In order for Adobe-Japan1-4 CID-keyed fonts to be useful and meaningful, the glyphs of all JIS X 0208:1997 kanji must be JIS90-compliant. This affects CIDs 1125 through 7477 (6,353 CIDs) in Supplement 0, and CIDs 8284 and 8285 in Supplement 1. Some subtle glyph variations in Supplement 4 (see Section 7) make this necessary." (p. 5) "In order to ensure glyph consistency across fonts of different manufactures, the JIS X 0208:1997 kanji (CIDs 1125-7477 and 8284-8285 of Supplements 0 and 1, respectively) must become JIS90-compliant. This is due to the fact that some of the JIS X 0208:1997 kanji variants in the Adobe-Japan1-4 are sometimes subtle in their difference with their JIS90 (standard) forms". (p. 95) In both paragraphs, it seems that the JIS90-compliancy is requested as a part of Adobe-Japan1-4. It means that the request was introduced when Adobe-Japan1-4 was defined, and the legacy Adobe products before Adobe-Japan1-3 are not (guaranteed to be) JIS90-compliant? According to TechNote #5078 p. 221, Adobe-Japan1-1 & -2 (1994) specification were printed by Morisawa's RyuminPro-Light. I remember, OCF Ryumin-Light was designed before JIS90 (1988?) so it is possible that original Ryumin-Light forms are not compliant with JIS90. Unfortunately, Morisawa removed the glyph difference data between their OCF and their 1st CID-keyed font, I could not check in concrete example. About 2-b. CIDs for vendor defined charset ========================================== Comparing with group 1 and group 2-a, it slightly unclear what we should refer as primal definition. CID 7633-7886: TechNote notes nothing in detail, used for various charset: 78-XXX (CMap for legacy JIS C 6226-1978), UniJIS-XXX (intersection of JIS X 0208 & UCS2), Ext-XXX (NEC), NWP-XXX (NEC word processor Bungo), Add-XXX (Fujitsu FM-R), 83pv-XXX (KanjiTalk 6), 90pv-XXX (KanjiTalk 7), 90ms-XXX (Windows 3.1 J). CID 7958-8004: TechNote notes nothing in detail, but used for single charset": used by only Add-XXX CMap (Fujitsu FM-R) CID 8359-8717: TechNote notes "to support the Microsoft Windows 3.1 J character set". However, CID 8561 & 8592 are later classified as glyphs for JIS X 0212:1990 (p. 196) To note from easier to harder, I describe 2nd, 3rd and 1st region. Comment 5: primal glyph definition of FM-R characters ----------------------------------------------------- The 2nd region (CIDs for Fujitsu FM-R) is easy. Refering Fujitsu product (if available still), or use existing Adobe product (e.g. KozMinProVI-Light) as the primal definition is simplest. CID 7963 (originally FM-R Shift-JIS 0x8952) is the unique CID for U+5653 which is now a part of latest Japanese charset JIS X 0213:2004, so it might be helpful to state whether CID 7963 must be JIS X 0213:2004 compliant, or doesn't have to be. # IMHO, FM-R charset was based on legacy JIS78, so we can expect # the FM-R RKSJ 0x8952 is "traditional"-looking variant of U+5618, # but we cannot assume the form is compliant to JIS X 0213:2004, # so introduction of yet-another CID for U+5653 of JIS X 0213:2004 # may be simple for information interchange. Comment 6: primal glyph definition of MS cp392 characters --------------------------------------------------------- The 3rd region has small problem. There are official specification of "Windows 3.1 J charset" http: //www.microsoft.com/globaldev/reference/dbcs/932.mspx and official mapping to UCS2 http: //www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT Comparing the UCS2 codepoints for CIDs in this region, there are several difference. Microsoft CP932.TXT uses U+9592 as a character for ms-cp932 codepoints 0xEECC & 0xFBE8. For U+9592, Adobe-Japan1-6 provides 2 CIDs 8685 and 13693. CID 8685 was introduced to support Microsoft Windows 3.1 J (Adobe-Japan1-2), CID 13693 was introduced as JIS X 0208:1997 kanji variant (Adobe-Japan1-4). I guess CID 13693 was originally introduced to provide a variant form that JIS X 0208:1997 unifies with U+9593. But, now, Adobe uses CID 13693 as a variant of U+9592. I think the note in TechNote and mapping tables should be consistent, as: * possible fix 1: add note "Now CID 13693 is for IBM Selected kanji variant." to fit ms-cp932. * possible fix 2: move CID 13693 from a variant of U+9592 to that of U+9593 to fit JIS X 0208:1997. CID 8542 (introduced in Adobe-Japan1-2) has another inconsistency problem. According to Unicode specification on CJK Compatibility Ideographs, "In addition, another 34 ideographs from various regional and industry standards were encoded in this book, primarily to achieve round-trip conversion compatibility. Twelve of these 34 ideographs (U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27, U+FA28, and U+FA29) are not encoded in the CJK Unified Ideographs Areas. These 12 characters are not duplicates and should be treated as a small extension to the set of unified ideographs" (The Unicode Standard 4.0, p. 305). Thus (I guess), sequence.txt moves following CIDs from original codepoints in CJK Compatibility Ideographs to those in CJK Unified Ideographs. CID cid2code IVS-base kIRG_JSource_of_IVS-base 8481 U+FA12 U+6674 0-4032 (JIS X 0208-1990) 8542 U+FA15 U+20611 *NONE* 8548 U+FA16 U+732A 0-4376 (JIS X 0208-1990) 8571 U+FA17 U+76CA 0-3157 (JIS X 0208-1990) 8579 U+FA18 U+793C 0-4E69 (JIS X 0208-1990) 8580 U+FA19 U+795E 0-3F90 (JIS X 0208-1990) 8581 U+FA1A U+7965 0-3E4D (JIS X 0208-1990) 8583 U+FA1B U+798F 0-4A21 (JIS X 0208-1990) 8587 U+FA1C U+9756 0-4C77 (JIS X 0208-1990) 8590 U+FA1D U+7CBE 0-403A (JIS X 0208-1990) 8599 U+FA1E U+7FBD 0-3129 (JIS X 0208-1990) 8612 U+FA20 U+8612 1-5A29 (JIS X 0212-1990) 8622 U+FA22 U+8AF8 0-3D74 (JIS X 0208-1990) 8633 U+FA25 U+9038 0-306F (JIS X 0208-1990) 8636 U+FA26 U+90FD 0-4554 (JIS X 0208-1990) 8699 U+FA2A U+98EF 0-4853 (JIS X 0208-1990) 8700 U+FA2B U+98FC 0-3B74 (JIS X 0208-1990) 8702 U+FA2C U+9928 0-345B (JIS X 0208-1990) 8715 U+FA2D U+9DB4 0-4461 (JIS X 0208-1990) As shown in this list, most base codepoints mapped by IVS are the characters that have JIS sources and easy to reduce to union of JIS charsets for information interchange. But CID 8542 is exceptional, its IVS base codepoint U+20611 has no JIS source. According to legacy CMaps designed for Adobe-Japan1-2 (90ms-RKSJ-H, UniJIS-UCS2-H), CID 8542 seems to be introduced to display ms-cp932 codepoints 0xEDF9 & 0xFB58 (CP932.TXT maps them to U+FA15). Recent CMaps (UniJIS-UTF16-H, UniJIS-UTF32-H, UniJISX0213-UTF32-H) displays U+FA15 by CID 20307. According to TechNote #5078, CID 20307 was introduced as a glyph for JIS X 0213:2004 compliancy, and sequence.txt defines CID 20307 as one of variant forms of U+51DE. On the other hand, sequence.txt defines CID 8542 as one of variant forms for U+20611. According to Unihan.txt, CID 14294 is canonical form of U+20611 and CID 8542 is variant form of U+20611. Anyway, U+20611 is not included in legacy JIS charsets (JIS X 0208, JIS X 0212, JIS X 0213), so using (variant of) U+20611 for legacy ms-cp932 codepoint is slightly confusing. I think taking CID 8542 as variant of U+51DE (included in JIS X 0212 and JIS X 0213) is better for information interchange. * possible fix 1: redefine CID 8542 from a variant form of U+20611 to that of U+51DE, for ms-cp932-derived systems' information interchange. * possible fix 2: add note "Now CID 20307 is used for IBM Selected kanji, but JIS X 0213:2004 compliant form", to indicate glyph difference from CID 8542. Comment 7: primal glyph definition of CID shared ------------------------------------------------ by several vendor character set ------------------------------- The 1st region is difficult. These CIDs had ever been introduced by Adobe-Japan1-0 for compatibility with legacy PS system. Although legacy 78-XXX CMaps had ever used them as CIDs for JIS78 charset, current TechNotes #5078 does not define them as their glyphs must be JIS78-compliant, or not. As Ken Lunde's "CJKV" notes (p. 919), there is large group of source-dependent (in the other words, JIS didn't clarified) form difference of JIS C 6226- 1978 versus JIS X 0208-1983 (or JIS X 0212-1990). Thus it is reasonable to assume the glyph shapes for JIS78 characters in legacy PS systems (designed before 1990) are not guaranteed to be JIS78-compliant. Strictly JIS78-compliant glyphs are introduced in Adobe-Japan1-4 and Adobe-Japan1-6. I think using these newer CIDs are more exact to specify JIS78- compliant glyph. Recent CMaps from Unicode codepoint to CID number is not appropriate to refer as glyph definition. The rest CMaps are all vendor defined charsets: it is difficult to determine the priorities of them, although Windows 3.1 J might be most popular one (others are not registered charset in IANA). As a result, it might be acceptable to use exisiting Adobe product as the primal glyph definition. In addition, some CIDs in this region are defined as a (variant) form of non-JIS character. Considering the history that these CIDs are introduced for legacy PS systems, using non-JIS cjaracter as a basis of these CIDs are slightly confusing for information interchange. CIDs has no Japanese source (no JIS nor Japanese vendor character set). CID IVS_base_codepoint similar_JIS_character 7641 U+28CDD U+958F (JIS78 form?) 7670 U+25874 U+7A3D (JIS78 form?) 7672 U+8346 U+834A (JIS78 form?) 7673 U+28EF6 U+9699 (JIS78 form?) 7687 U+6805 U+67F5 (JIS78 form?) 7825 U+21A1A U+5BC3 (JIS78 form?) 7834 U+67FA U+62D0 (JIS78 form?) 7836 U+688E U+688D (JIS78 form?) 7838 U+243D0 U+71D7 (JIS78 form?) CIDs mapped to character from "Unified Japanese IT Vendors Contemporary Ideographs, 1993" (but no JIS source). 7655 U+3D4E U+6F97 (JIS78 form?) CIDs mapped to character from IBM CodePage 932 (but no JIS source). 7680 U+663B U+6602 (JIS78 form?) CIDs mapped to Unicode character included in JIS X 0213 CIDs included by JIS X 0213:2000 7715 U+87EC U+8749 (trad. kanji) 7727 U+9A52 U+9A28 (trad. kanji) 7739 U+7C1E U+7BAA (trad. kanji) 7861 U+853E U+85DC (different?) CIDs included by JIS X 0213:2004 7774 U+525D U+5265 (JIS78 form?) 7826 U+5C5B U+5C4F (JIS78 form?) 2-c. CIDs for non software vendor charset ========================================= Comment 8: more detailed reference list might be ------------------------------------------------ expected. --------- Adobe TechNote #5078 is just refering the name (e.g. K-JIS, U-PRESS). I wish more detailed references are given, as Ken Lunde's "CJKV" gives, if KozMinProVI-Light is not primal glyph definition. 3. CID whose glyph does not refer any existing documents ======================================================== Comment 9: relationship with APGS --------------------------------- For example, Adobe-Japan1-6 introduced 1986 Ideographs in CID 21071-23057. Among the Ideographs, 3 CIDs (21072-21074) refer JIS X 0213:2004, other 17 CIDs (21164, 21371, 21558, 21722, 21791, 21933, 22006, 22010, 22045, 22063, 22186, 22341, 22583, 22788, 22843, 22920, 23006) refer JIS X 0212-1990. Other 1966 CIDs refer nothing. KozMinProVI- Light is primal glyph definition? In Adobe TechNote #5078, it is shortly noted "to support the Apple Mac OS X Version 10.2 glyph set". Some documents published around Apple says as 1966 CIDs are proposed to Adobe-Japan1-5, as a part "Apple Publishing Glyph Set" (APGS), based on collection of glyphs in legacy CTP systems. It might be expected to be noted: whether Apple published glyphset definition of APGS and 1966 CIDs should be compliant with, or Adobe defines some compatible glyphset and 1966 CIDs should be compliant with Adobe's APGS-compatible glyphs. # Unfortunately, Dainippon screen's Hiragino # OpenTypes are the first (and only) implementation # of real APGS font, but it is not reliable # source as Adobe-Japan1 OpenType. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report)