L2/15-334

Nicolas Tranter
Date: Wed, Dec 9, 2015 at 7:55 AM
Subject: Hentaigana proposal

I comment as a western Japanologist who teaches and researches using hentaigana. I have published with hentaigana using image files (resulting in two publisher errors) and will publish next year with hentaigana using the Koin Hentaigana font (Koin変体仮名外字明朝.tte), and anticipate typesetting problems. I refer to the 2015 proposal L2/15-239 to include hentaigana, including the appended paper by Takada Tomokazu, Yada Tsutomu and Saito Tatsuya ('The past, present and future of Hentaigana Standardization for Information Interchange'). I also refer to Yada Tsutomu's support of the proposal ('About the inclusion of standardized codepoints for Hentaigana', L2/15-318). As the names and numbering of proposed characters is an issue I deal with below, I also refer to individual hentaigana in the proposal by their MJ-codes as used in the proposers' own websites (e.g. http://mojikiban.ipa.go.jp/xb164/). 

SELECTION: The selection is good, consisting of 286 forms, although this would be realised as 299 characters. The earlier 2009 proposal referred to was based on the Mojikyo M113.ttf font, which has 213 hentaigana characters and includes a few major basic gaps. The Koin Hentaigana font has 549 characters, which excluding separate forms with voicing and 'half-voicing' diacritics consists of 330 hentaigana, but includes some very rare forms, including ones that do not occur in late period texts. 

The selection of 'academic' hentaigana is appropriate and lacks major gaps. On the other hand, the Ministry of Justice hentaigana requirements are ones that have been decided by the Ministry of Justice in 2004 for name registration purposes, and so, although one could argue easily with their 2004 decision (and I would), the fact that they are already official means it is pointless to argue with their inclusion in Unicode. 

It's been noted that a few hentaigana are almost identical to normal hiragana, especially e HENTAIGANA LETTER E VARIANT 4 = MJ090017 (cf. ), shi HENTAIGANA LETTER SI VARIANT 2 = MJ090072 (cf. HIRAGANA LETTER SI ) and nu HENTAIGANA LETTER NU VARIANT 2 = MJ090149 (cf. HIRAGANA LETTER NU ): their differences are solely that the 'brush' is removed from the paper on a downward rather than a rightward flourish, reflecting vertical handwriting. Ordinarily I would argue against including them, but since the MoJ has recognised them as official variants they need to be included.

The decision to propose in most cases one codepoint for the hentaigana derived from a single Chinese character is sensible, as also is the decision to allow multiple codepoints in certain cases where manuscripts use side-by-side significantly distinct forms derived from the same Chinese character and with the same value. An example of the latter is HENTAIGANA LETTER KA VARIANT 3 = MJ090025and KA VARIANT 4 = MJ090026, both pronounced ka and both derived from the Chinese character , but which are routinely both found in the same manuscript by the same hand as if they were separate graphemes from the Heian to the Meiji periods.

POLYPHONY. Several hentaigana are truly polyphonous (e.g. the -derived hentaigana = ne MJ090151 or MJ090059 ko, or the -derived hentaigana = me MJ090222 or ma MJ090205). In particular, those hentaigana derived from and associated with n (MJ090298, MJ090299) historically (also the source of HIRAGANA LETTER N )  are also used for mu (MJ090214, MJ090215) and mo (MJ090224, MJ090223). Diachronically, n in native Japanese words is usually derived from an earlier mu. Takada et al. includes a list of 10 kanji sources that this applies to in the proposed repertoire. (Strictly, this affects 11 hentaigana, because the proposal has two forms for -derived characters.) The proposal's solution is to assign different identifiers, e.g. = HENTAIGANA LETTER NE VARIANT 1 and HENTAIGANA LETTER KO VARIANT 2, = HENTAIGANA LETTER ME VARIANT 3 and HENTAIGANA LETTER MA VARIANT 7, and the two derived from = HENTAIGANA LETTER N VARIANT 1, N VARIANT 2, MU VARIANT 1, MU VARIANT 2, MO VARIANT 1 and MO VARIANT 2. This means that there would be characters that are given more than one codepoint and identifier but are formally and etymologically identical, adding 13 unnecessary repetitions to the character set. I would favour Yada's naming system, where the polyphonous characters are given a single codepoint and identifier, e.g. = HENTAIGANA LETTER NE-KO, = HENTAIGANA ME-MA, and two -derived forms = HENTAIGANA LETTER N-MU-MO 1 and N-MU-MO 2.

STANDARD VARIATION: The suggestion that hentaigana be standard variation characters means that in the absence of appropriate font support they would be rendered as hiragana with the same value. (This appears to underlie the decision to propose different codepoints and names for the polyphonous hentaigana.) I do not support this. The two main uses of hentaigana are academic and by the MoJ. Academics will only use hentaigana if they specifically need them to be rendered as such rather than as hiragana, and because hentaigana as proposed for inclusion in Unicode and hiragana that are already encoded together constitute the same pre-1900 script proofreading a text to spot incorrect renderings would be very difficult. It would be easier for academics if lack of font support rendered hentaigana simply as blanks. Similarly, MoJ name registration normally involves recording the name both in registered spelling and in hiragana transcription, so having hentaigana show up as blanks would not cause a problem.