Comment on PRI 98: IVD Adobe-Japan1 (pt.3)

From: mpsuzuki@hiroshima-u.ac.jp
Date: Tue Mar 13 2007 - 18:53:29 CST

  • Next message: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"

    Dear Sirs,

    This post is last, my comments 3-9.

    Except of comment 3 (interoperability with OpenType font),
    most comments on this post are rather for Adobe TechNote #5078
    itself than for PRI 98. I think, the primal form definitions
    for each CIDs are expected to be clarified, even if the answer
    is simply "KozMinProVI-Light is the primal definition, it is
    provided as is" (the assertion is sufficiently helpful).

    Regards,
    mpsuzuki

    --
    Adobe TechNote #5078 includes very helpful informations
    about the references of glyph design (which standard
    the glyph should be compliant to, and which document
    we should refer to). However, there are several mismatch
    between the roots of CIDs and the codepoint in IVD.
    In following, I describe in detail.
    I think there are 3 groups of CIDs in Adobe-Japan1.
    1. CID whose glyph should be designed with reference
       to Japanese Industrial Standards (JIS).
    2. CID whose glyph are designed to display existing
       "proprietary" system out of Adobe.
       2-a. CIDs whose glyphs are shared by legacy
            PostScript systems, like, Morisawa OCF.
            ex. Adobe-Japan1-0.
       2-b. CIDs for "vendor character set".
            ex. Microsoft Windows 3.1 J,
            Apple Kanji Talk 6,
            Fujitsu FM-R.
       2-c. CIDs for character set not by JIS or software
            vendors (so no specific implementations nor
            platforms are assumed).
            ex. National Language Council, K-JIS, U-PRESS, etc.
    3. CID whose glyph does not refer any existing documents.
    About group 1 (CID for JIS)
    ===========================
    Comment 3: OT-tag interoperability?
    -----------------------------------
    Unfortunately, current TrueType font format specification
    cannot include the cmap table including IVS (>= 32bit). 
    So, OpenType layout feature will be quite important to
    support IVS.
    In OpenType, Adobe had already introduced the feature tags
    to specify the ideographic variant by JIS revision:
    "jp78", "jp83", etc:
    http://partners.adobe.com/public/developer/opentype/index_tag6.html#jp78
    http://partners.adobe.com/public/developer/opentype/index_tag6.html#jp83
    http://partners.adobe.com/public/developer/opentype/index_tag6.html#jp90
    http://partners.adobe.com/public/developer/opentype/index_tag6.html#hojo
    http://partners.adobe.com/public/developer/opentype/index_tag7.html#nlck
    http://partners.adobe.com/public/developer/opentype/index_tag6.html#jp04
    For convenient interoperability between text with ideographic
    variant specification by "jpXX" OT-tag and that by IVS,
    I wish if Adobe defines the mapping table of which Adobe-Japan1
    IVS should be used for these OT-tag. ("nlck" might be slightly
    different category, sorry)
    About group 2 (CID shared by legacy PS systems)
    ===============================================
    About 2-a. CIDs shared by legacy PS systems
    ===========================================
    Comment 4: Requirement of JIS90 compliancy
    ------------------------------------------
    Now, Adobe TechNote #5078 notes about JIS90-compliancy, as:
    "In order for Adobe-Japan1-4 CID-keyed fonts to be useful and
    meaningful, the glyphs of all JIS X 0208:1997 kanji must be
    JIS90-compliant. This affects CIDs 1125 through 7477 (6,353
    CIDs) in Supplement 0, and CIDs 8284 and 8285 in Supplement 1.
    Some subtle glyph variations in Supplement 4 (see Section 7)
    make this necessary." (p. 5)
    "In order to ensure glyph consistency across fonts of different
    manufactures, the JIS X 0208:1997 kanji (CIDs 1125-7477 and
    8284-8285 of Supplements 0 and 1, respectively) must become
    JIS90-compliant. This is due to the fact that some of the JIS
    X 0208:1997 kanji variants in the Adobe-Japan1-4 are sometimes
    subtle in their difference with their JIS90 (standard) forms".
    (p. 95)
    In both paragraphs, it seems that the JIS90-compliancy is
    requested as a part of Adobe-Japan1-4. It means that the request
    was introduced when Adobe-Japan1-4 was defined, and the legacy
    Adobe products before Adobe-Japan1-3 are not (guaranteed to be)
    JIS90-compliant?
    According to TechNote #5078 p. 221, Adobe-Japan1-1 & -2 (1994)
    specification were printed by Morisawa's RyuminPro-Light.
    I remember, OCF Ryumin-Light was designed before JIS90 (1988?)
    so it is possible that original Ryumin-Light forms are not
    compliant with JIS90. Unfortunately, Morisawa removed the glyph
    difference data between their OCF and their 1st CID-keyed font,
    I could not check in concrete example.
    About 2-b. CIDs for vendor defined charset
    ==========================================
    Comparing with group 1 and group 2-a, it slightly unclear what
    we should refer as primal definition.
    CID 7633-7886: TechNote notes nothing in detail, used for various
                   charset: 78-XXX (CMap for legacy JIS C 6226-1978),
                   UniJIS-XXX (intersection of JIS X 0208 & UCS2),
                   Ext-XXX (NEC), NWP-XXX (NEC word processor Bungo),
                   Add-XXX (Fujitsu FM-R),
                   83pv-XXX (KanjiTalk 6), 90pv-XXX (KanjiTalk 7),
                   90ms-XXX (Windows 3.1 J).
    CID 7958-8004: TechNote notes nothing in detail, but used for single
                   charset": used by only Add-XXX CMap (Fujitsu FM-R)
    CID 8359-8717: TechNote notes "to support the Microsoft Windows 3.1 J
                   character set". However, CID 8561 & 8592 are later 
                   classified as glyphs for JIS X 0212:1990 (p. 196)
    To note from easier to harder, I describe 2nd, 3rd and 1st region.
    Comment 5: primal glyph definition of FM-R characters
    -----------------------------------------------------
    The 2nd region (CIDs for Fujitsu FM-R) is easy. Refering Fujitsu
    product (if available still), or use existing Adobe product
    (e.g. KozMinProVI-Light) as the primal definition is simplest.
    CID 7963 (originally FM-R Shift-JIS 0x8952) is the unique CID for
    U+5653 which is now a part of latest Japanese charset JIS X 0213:2004,
    so it might be helpful to state whether CID 7963 must be JIS X
    0213:2004 compliant, or doesn't have to be.
    # IMHO, FM-R charset was based on legacy JIS78, so we can expect
    # the FM-R RKSJ 0x8952 is "traditional"-looking variant of U+5618,
    # but we cannot assume the form is compliant to JIS X 0213:2004,
    # so introduction of yet-another CID for U+5653 of JIS X 0213:2004
    # may be simple for information interchange.
    Comment 6: primal glyph definition of MS cp392 characters
    ---------------------------------------------------------
    The 3rd region has small problem.
    There are official specification of "Windows 3.1 J charset"
    http://www.microsoft.com/globaldev/reference/dbcs/932.mspx
    and official mapping to UCS2
    http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
    Comparing the UCS2 codepoints for CIDs in this region,
    there are several difference.
    Microsoft CP932.TXT uses U+9592 as a character for ms-cp932
    codepoints 0xEECC & 0xFBE8. For U+9592, Adobe-Japan1-6
    provides 2 CIDs 8685 and 13693. CID 8685 was introduced
    to support Microsoft Windows 3.1 J (Adobe-Japan1-2),
    CID 13693 was introduced as JIS X 0208:1997 kanji variant
    (Adobe-Japan1-4). I guess CID 13693 was originally introduced
    to provide a variant form that JIS X 0208:1997 unifies
    with U+9593. But, now, Adobe uses CID 13693 as a variant
    of U+9592. I think the note in TechNote and mapping tables
    should be consistent, as:
    * possible fix 1: add note "Now CID 13693 is for IBM
                      Selected kanji variant." to fit ms-cp932.
    * possible fix 2: move CID 13693 from a variant of U+9592
                      to that of U+9593 to fit JIS X 0208:1997.
    CID 8542 (introduced in Adobe-Japan1-2) has another
    inconsistency problem. According to Unicode specification
    on CJK Compatibility Ideographs, "In addition, another
    34 ideographs from various regional and industry standards
    were encoded in this book, primarily to achieve round-trip
    conversion compatibility. Twelve of these 34 ideographs
    (U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21,
    U+FA23, U+FA24, U+FA27, U+FA28, and U+FA29) are not encoded
    in the CJK Unified Ideographs Areas. These 12 characters
    are not duplicates and should be treated as a small extension
    to the set of unified ideographs" (The Unicode Standard 4.0,
    p. 305). Thus (I guess), sequence.txt moves following CIDs
    from original codepoints in CJK Compatibility Ideographs
    to those in CJK Unified Ideographs.
    CID   cid2code  IVS-base  kIRG_JSource_of_IVS-base
    8481  U+FA12    U+6674    0-4032 (JIS X 0208-1990)
    8542  U+FA15    U+20611   *NONE*
    8548  U+FA16    U+732A    0-4376 (JIS X 0208-1990)
    8571  U+FA17    U+76CA    0-3157 (JIS X 0208-1990)
    8579  U+FA18    U+793C    0-4E69 (JIS X 0208-1990)
    8580  U+FA19    U+795E    0-3F90 (JIS X 0208-1990)
    8581  U+FA1A    U+7965    0-3E4D (JIS X 0208-1990)
    8583  U+FA1B    U+798F    0-4A21 (JIS X 0208-1990)
    8587  U+FA1C    U+9756    0-4C77 (JIS X 0208-1990)
    8590  U+FA1D    U+7CBE    0-403A (JIS X 0208-1990)
    8599  U+FA1E    U+7FBD    0-3129 (JIS X 0208-1990)
    8612  U+FA20    U+8612    1-5A29 (JIS X 0212-1990)
    8622  U+FA22    U+8AF8    0-3D74 (JIS X 0208-1990)
    8633  U+FA25    U+9038    0-306F (JIS X 0208-1990)
    8636  U+FA26    U+90FD    0-4554 (JIS X 0208-1990)
    8699  U+FA2A    U+98EF    0-4853 (JIS X 0208-1990)
    8700  U+FA2B    U+98FC    0-3B74 (JIS X 0208-1990)
    8702  U+FA2C    U+9928    0-345B (JIS X 0208-1990)
    8715  U+FA2D    U+9DB4    0-4461 (JIS X 0208-1990)
    As shown in this list, most base codepoints mapped by
    IVS are the characters that have JIS sources and easy
    to reduce to union of JIS charsets for information
    interchange. But CID 8542 is exceptional, its IVS base
    codepoint U+20611 has no JIS source.
    According to legacy CMaps designed for Adobe-Japan1-2
    (90ms-RKSJ-H, UniJIS-UCS2-H), CID 8542 seems to be
    introduced to display ms-cp932 codepoints 0xEDF9 & 0xFB58
    (CP932.TXT maps them to U+FA15). Recent CMaps
    (UniJIS-UTF16-H, UniJIS-UTF32-H, UniJISX0213-UTF32-H)
    displays U+FA15 by CID 20307. According to TechNote #5078,
    CID 20307 was introduced as a glyph for JIS X 0213:2004
    compliancy, and sequence.txt defines CID 20307 as one
    of variant forms of U+51DE.
    On the other hand, sequence.txt defines CID 8542 as
    one of variant forms for U+20611. According to Unihan.txt,
    CID 14294 is canonical form of U+20611 and CID 8542
    is variant form of U+20611. Anyway, U+20611 is not
    included in legacy JIS charsets (JIS X 0208, JIS X 0212,
    JIS X 0213), so using (variant of) U+20611 for legacy
    ms-cp932 codepoint is slightly confusing.
    I think taking CID 8542 as variant of U+51DE (included
    in JIS X 0212 and JIS X 0213) is better for information
    interchange.
    * possible fix 1: redefine CID 8542 from a variant form of
                      U+20611 to that of U+51DE, for ms-cp932-derived
                      systems' information interchange.
    * possible fix 2: add note "Now CID 20307 is used for IBM
                      Selected kanji, but JIS X 0213:2004 compliant
                      form", to indicate glyph difference from CID 8542.
    Comment 7: primal glyph definition of CID shared
    ------------------------------------------------
               by several vendor character set
               -------------------------------
    The 1st region is difficult.
    These CIDs had ever been introduced by Adobe-Japan1-0
    for compatibility with legacy PS system. Although
    legacy 78-XXX CMaps had ever used them as CIDs for
    JIS78 charset, current TechNotes #5078 does not
    define them as their glyphs must be JIS78-compliant,
    or not. As Ken Lunde's "CJKV" notes (p. 919), there
    is large group of source-dependent (in the other words,
    JIS didn't clarified) form difference of JIS C 6226-
    1978 versus JIS X 0208-1983 (or JIS X 0212-1990).
    Thus it is reasonable to assume the glyph shapes
    for JIS78 characters in legacy PS systems (designed
    before 1990) are not guaranteed to be JIS78-compliant.
    Strictly JIS78-compliant glyphs are introduced
    in Adobe-Japan1-4 and Adobe-Japan1-6. I think using
    these newer CIDs are more exact to specify JIS78-
    compliant glyph.
    Recent CMaps from Unicode codepoint to CID number
    is not appropriate to refer as glyph definition.
    The rest CMaps are all vendor defined charsets:
    it is difficult to determine the priorities of them,
    although Windows 3.1 J might be most popular one
    (others are not registered charset in IANA).
    As a result, it might be acceptable to use exisiting
    Adobe product as the primal glyph definition.
    In addition, some CIDs in this region are defined
    as a (variant) form of non-JIS character. Considering
    the history that these CIDs are introduced for
    legacy PS systems, using non-JIS cjaracter as
    a basis of these CIDs are slightly confusing for
    information interchange.
    CIDs has no Japanese source (no JIS nor Japanese
    vendor character set).
    CID	IVS_base_codepoint	similar_JIS_character
    7641	U+28CDD			U+958F (JIS78 form?)
    7670	U+25874			U+7A3D (JIS78 form?)
    7672	U+8346			U+834A (JIS78 form?)
    7673	U+28EF6			U+9699 (JIS78 form?)
    7687	U+6805			U+67F5 (JIS78 form?)
    7825	U+21A1A			U+5BC3 (JIS78 form?)
    7834	U+67FA			U+62D0 (JIS78 form?)
    7836	U+688E			U+688D (JIS78 form?)
    7838	U+243D0			U+71D7 (JIS78 form?)
    CIDs mapped to character from "Unified Japanese IT
    Vendors Contemporary Ideographs, 1993" (but no JIS
    source).
    7655	U+3D4E			U+6F97 (JIS78 form?)
    CIDs mapped to character from IBM CodePage 932
    (but no JIS source).
    7680	U+663B			U+6602 (JIS78 form?)
    CIDs mapped to Unicode character included in JIS X
    0213
    CIDs included by JIS X 0213:2000
    7715	U+87EC			U+8749 (trad. kanji)
    7727	U+9A52			U+9A28 (trad. kanji)
    7739	U+7C1E			U+7BAA (trad. kanji)
    7861	U+853E			U+85DC (different?)
    CIDs included by JIS X 0213:2004
    7774	U+525D			U+5265 (JIS78 form?)
    7826	U+5C5B			U+5C4F (JIS78 form?)
    2-c. CIDs for non software vendor charset
    =========================================
    Comment 8: more detailed reference list might be
    ------------------------------------------------
               expected.
               ---------
    Adobe TechNote #5078 is just refering the name
    (e.g. K-JIS, U-PRESS). I wish more detailed
    references are given, as Ken Lunde's "CJKV" gives,
    if KozMinProVI-Light is not primal glyph definition.
    3. CID whose glyph does not refer any existing documents
    ========================================================
    Comment 9: relationship with APGS
    ---------------------------------
    For example, Adobe-Japan1-6 introduced 1986
    Ideographs in CID 21071-23057. Among the
    Ideographs, 3 CIDs (21072-21074) refer JIS
    X 0213:2004, other 17 CIDs (21164, 21371,
    21558, 21722, 21791, 21933, 22006, 22010,
    22045, 22063, 22186, 22341, 22583, 22788,
    22843, 22920, 23006) refer JIS X 0212-1990.
    Other 1966 CIDs refer nothing. KozMinProVI-
    Light is primal glyph definition?
    In Adobe TechNote #5078, it is shortly noted
    "to support the Apple Mac OS X Version 10.2
    glyph set". Some documents published around
    Apple says as 1966 CIDs are proposed to
    Adobe-Japan1-5, as a part "Apple Publishing
    Glyph Set" (APGS), based on collection of
    glyphs in legacy CTP systems. 
    It might be expected to be noted: whether
    Apple published glyphset definition of APGS
    and 1966 CIDs should be compliant with,
    or Adobe defines some compatible glyphset
    and 1966 CIDs should be compliant with Adobe's
    APGS-compatible glyphs.
    # Unfortunately, Dainippon screen's Hiragino
    # OpenTypes are the first (and only) implementation
    # of real APGS font, but it is not reliable
    # source as Adobe-Japan1 OpenType.
    


    This archive was generated by hypermail 2.1.5 : Tue Mar 13 2007 - 21:36:08 CST