Re: Katakana Extended-A?

From: Benjamin M Scarborough (
Date: Mon Dec 03 2007 - 13:35:48 CST

  • Next message: James Cloos: "Re: vim and Arabic/Farsi support"

    First of all, I would like to say that I just realized that there was a
    second volume of the dictionary, available at

    Andrew West wrote:
    >It may be that some of the tone marks can use existing combining
    >characters, but it looks like at least some of them need encoding.

    I have tried to find suitable combining marks already in Unicode, but
    nothing seems to represent these tone marks appropriately.

    >> Lastly, there are two combining marks visible on individual characters:
    >> a combining line above and a combining dot below. These could be
    >> unified with U+0305 COMBINING OVERLINE and U+0323 COMBINING DOT BELOW
    >> respectively.
    >We need to be sure of exactly which letters these marks modify. If
    >they only modify certain letters it may be simpler to encode the
    >modified letter as a single non-decomposable character (letters with
    >diacritic marks do not always need to be decomposable -- cf. Yi and

    The overline mark appears only on U, SMALL U, O, SMALL O, SA, SE, SO,
    TI, TU. Instances of all of these can be found in the indices. In many
    cases the bar connects with parts of the base character. They are
    collated separately from their unmarked counterparts; however, this
    dictionary also separates characters with (han)dakuten from their
    unmarked counterparts, thus SE < ZE < ZE WITH TOPBAR < SO < ZO < SO
    WITH TOPBAR < TA. It is unknown, then, whether this bar is intended to
    be an integral part of the character or a combining mark similar to the
    dakuten and handakuten. It would be useful to know the meaning of the
    mark. It is worth noting, however, that O < O WITH TOPBAR < U < U WITH
    TOPBAR < WO < KA at the beginning of a syllable, but N < WO < O WITH
    TOPBAR < U WITH TOPBAR at the middle/end.

    The dot below mark appears on KA, KI, KU, KE, KO, SA WITH TOPBAR, SE
    TOPBAR, TE, TO, PA, PI, PU, PE, PO. Unlike the dakuten, handakuten, and
    topbar, the dot below is ignored for collation; characters with dot
    below are freely intermixed with unmarked characters and the
    dictionary's headers show both varieties. Furthermore, the dot below
    appears to be used with entire columns of consonants (Kx, xx WITH
    TOPBAR, Tx, Px). However, because it is used with PA, PI, PU, PE, PO
    and -not- HA, HI, HU, HE, HO, it would seem inappropriate to encode the
    combinations with dot below as nondecomposable characters.

    >> At the above site is evidence of KATANAKA LETTER YI, KATAKANA LETTER
    >> They apparently were introduced in the Meiji era but never entered
    >> common usage. However, I have not been able to find instances of any of
    >> these five characters in use.
    >The table from "中學教程/日本文典" is good, but a proposal would need a little
    >more evidence of their use.

    I'm well aware of this and am still trying to find proper evidence for

    >> If any of these characters are indeed potential additions to Unicode, I
    >> propose making a new Katakana Extended-A block at U+AAE0..U+AAFF.
    >Best to keep them in the same region as the other kana and bopomofo
    >blocks, etc. 2FE0..2FEF is free if sixteen characters are sufficient.

    I considered this at first, but then chose AAE0..AAFF for two reasons:

    1. there do appear to be more than sixteen characters to encode, and
    2. describes
    2FE0..2FEF as being in the "Symbols Area" rather than the "General
    Scripts Area."

    This archive was generated by hypermail 2.1.5 : Mon Dec 03 2007 - 13:37:57 CST