L2/04-152 Source: Mark Davis Subject: Katakana_Or_Hiragana Date: Thu, 6 May 2004 When we added the new script value Katakana_Or_Hiragana, we didn't adjust Table 2 in http://www.unicode.org/reports/tr29/tr29-6.html#Word_Boundaries Here are the relevant pieces of #29 Katakana Script = KATAKANA, or Any of the following: U+30FC (ー) KATAKANA-HIRAGANA PROLONGED SOUND MARK U+FF70 (ー) HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK U+FF9E (゙) HALFWIDTH KATAKANA VOICED SOUND MARK U+FF9F (゚) HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK and the rule: Do not break between Katakana Katakana×Katakana(13) Here are the contents of the script value: 3031..3035 ; Katakana_Or_Hiragana # Lm [5] VERTICAL KANA REPEAT MARK..~ ~ ~ MARK LOWER HALF 309B..309C ; Katakana_Or_Hiragana # Sk [2] KATAKANA-HIRAGANA VOICED SOUND MARK..~-~ SEMI-~ ~ ~ FF70 ; Katakana_Or_Hiragana # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF9E..FF9F ; Katakana_Or_Hiragana # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..~-~ SEMI-~ ~ ~ I'm guessing that what really should be done is to add new rules that would keep the new stuff together, and join it to the old stuff. Thus remove from Katakana U+FF70 (ー) HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK U+FF9E (゙) HALFWIDTH KATAKANA VOICED SOUND MARK U+FF9F (゚) HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK And add new rules: Katakana×Katakana_Or_Hiragana Katakana_Or_Hiragana×Katakana_Or_Hiragana Note: as someone here noted; there really was very little value to adding this script value; all of its characters should really just inherit their status from the previous character -- which is what Script=Inherited is to do. However, once done, it seems surprising that 30FC ; Common # Lm KATAKANA-HIRAGANA PROLONGED SOUND MARK doesn't have the value Katakana_Or_Hiragana. I'm guessing we just missed this in http://www.unicode.org/L2/L2003/03427-script-codes.txt http://www.unicode.org/L2/L2004/04083-muller-scripts.html http://www.unicode.org/L2/L2004/04096-script-changes.txt