L2/06-399 Subject: Script property for U+3200..U+33FF Source: Eric Muller, Adobe Systems Date: November 24, 2006 The script property for the characters U+3200..U+33FF, i.e. the blocks "Enclosed CJK Letters and Months" and "CJK Compatibility block" seems to be inconsistent. The purpose of the document is to propose to fix that inconsistency. First, consider: 3251..325F Common Circled numbers 32B1..32BF Common Circled numbers Since the circled numbers are fundamentally based on the Latin digits, it is appropriate to give them the same script at the Latin digits, i.e. Common. 327F Common Symbol (KOREAN STANDARD SYMBOL) This is really just a symbol (without a compatibility decomposition) so the script Common is appropriate. The inconsistency is really among the remaining characters. They all share the common pattern of being fundamentally formed from script-specific characters (Latin, Katakana, Han, or Hangul), arranged or decorated somehow. Furthermore, they all have a compatibility decomposition involving these script-specific characters. The inconsistency is that some are given the same script as those constituent characters, while others are given the script Common: Hangul parts: 3200..320D Hangul Parenthesized Hangul elements 320E..321C Hangul Parenthesized Hangul syllables 321D..321E Hangul Parenthesized Korean words 3260..326D Hangul Circled Hangul elements 326E..327B Hangul Circled Hangul syllables 327C..327D Hangul Circled Korean words 327E Common Circled Hangul syllable Han parts: 3220..3243 Common Parenthesized ideographs 3280..32B0 Common Circled ideographs 337B..337F Common Japanese era names 337F Common Japanese corporation 32C0..32CB Common Telegraph symbols for months 3358..3370 Common Telegraph symbols for hours 33E0..33FE Common Telegraph symbols for days Katakana parts: 32D0..32FE Common Circled Katakana 3300..3357 Common Squared Katana words Latin parts: 3250 Common Squared Latin abbreviation 32CC..32CF Common Squared Latin abbreviations 3371..337A Common Squared Latin Abbreviations 3380..33DF Common Squared Latin abbreviations 33FF Common Squared Latin abbreviation The only distinction which could be made among these characters is that the Telegraph symbols also incorporate Latin digits, in addition to Han characters, but I do not view this as significant. One can view those characters primarily as symbols, or primarily as ordinary text with stylistic constraints. Accordingly, this leads to two ways of resolving the inconsistency. Proposal A: the characters which currently have the Hangul script should be changed to have the Common script. Proposal B: the characters which currently have the Common script and "contain" of script-specific parts should be changed to have the script of their parts. (This excludes the circled numbers and the KOREAN STANDARD SYMBOL) I personally think that both points of view are equally valid, and that we need to bring considerations of implementation to make the call: - in rendering systems that process separately runs of different scripts (with "Common" resolved to some "ambient script", much like bidi resolves neutral characters), there is virtually no possibility of typographic interaction at the run boundaries, e.g. no possibility of ligatures or kerning. Thus there would be no possibility of kerning between, say, a squared latin abbreviation and a following non-Latin, script-specific character. - the representation of the Unicode data can be more compact if there are large runs of successive code points that share the same property value. This tends to be particularly important in small devices like mobile phones. Neither consideration is very strong, but they are enough to tip my choice toward proposal B. ---