L2/04-124 From: Mark Davis Subject: Script Values Date: April 6, 2004 1. In 4.0.1 we added a new script value: 3031..3035 ; Katakana_Or_Hiragana # Lm [5] VERTICAL KANA REPEAT MARK..VERTICAL KANA REPEAT MARK LOWER HALF 309B..309C ; Katakana_Or_Hiragana # Sk [2] KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK FF70 ; Katakana_Or_Hiragana # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF9E..FF9F ; Katakana_Or_Hiragana # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK A. It appears that we missed some other characters that should have an explicit value: Proposed: 30FC ; Katakana_Or_Hiragana # Lm KATAKANA-HIRAGANA PROLONGED SOUND MARK Maybe also: 30A0 ; Katakana_Or_Hiragana # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN 30FB ; Katakana # Pc KATAKANA MIDDLE DOT FF65 ; Katakana # Pc HALFWIDTH KATAKANA MIDDLE DOT B. We should have also updated the list in TR #29, which currently reads: Katakana: Script = KATAKANA, or Any of the following: U+30FC (ー) KATAKANA-HIRAGANA PROLONGED SOUND MARK U+FF70 (ー) HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK U+FF9E (゙) HALFWIDTH KATAKANA VOICED SOUND MARK U+FF9F (゚) HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK If the above Script changes are made, then it would be changed to simply: Katakana: Script = Katakana, or Script = Katakana_Or_Hiragana (Since Katakana_Or_Hiragana characters typically follow a base, it is no problem that the rule will cause no breakage between them.) 2. Right now, people have to dig the definition of the properties out of the TR. It would be better both for them and for our maintenance if they were treated like Line Break, as enumerated properties, with the values as given by TR #29 (as amended by the above). Here are suggested names. Default_Grapheme_Cluster_Type (DGCT) Default_Word_Type (DWT) Default_Sentence_Type (DST) The "Default" is explicit in the name, so that people are clear that these are expected to be overridden. Mark