Hentaigana proposal

From: Nicolas Tranter <n.tranter_at_sheffield.ac.uk>
Date: Wed, 9 Dec 2015 15:55:59 +0000

I comment as a western Japanologist who teaches and researches using
hentaigana. I have published with hentaigana using image files (resulting
in two publisher errors) and will publish next year with hentaigana using
the Koin Hentaigana font (Koin変体仮名外字明朝.tte), and anticipate typesetting
problems. I refer to the 2015 proposal L2/15-239 to include hentaigana,
including the appended paper by Takada Tomokazu, Yada Tsutomu and Saito
Tatsuya ('The past, present and future of Hentaigana Standardization for
Information Interchange'). I also refer to Yada Tsutomu's support of the
proposal ('About the inclusion of standardized codepoints for Hentaigana',
L2/15-318). As the names and numbering of proposed characters is an issue I
deal with below, I also refer to individual hentaigana in the proposal by
their MJ-codes as used in the proposers' own websites (e.g.
http://mojikiban.ipa.go.jp/xb164/).

SELECTION: The selection is good, consisting of 286 forms, although this
would be realised as 299 characters. The earlier 2009 proposal referred to
was based on the Mojikyo M113.ttf font, which has 213 hentaigana characters
and includes a few major basic gaps. The Koin Hentaigana font has 549
characters, which excluding separate forms with voicing and 'half-voicing'
diacritics consists of 330 hentaigana, but includes some very rare forms,
including ones that do not occur in late period texts.

The selection of 'academic' hentaigana is appropriate and lacks major gaps.
On the other hand, the Ministry of Justice hentaigana requirements are ones
that have been decided by the Ministry of Justice in 2004 for name
registration purposes, and so, although one could argue easily with their
2004 decision (and I would), the fact that they are already official means
it is pointless to argue with their inclusion in Unicode.

It's been noted that a few hentaigana are almost identical to normal
hiragana, especially *e* HENTAIGANA LETTER E VARIANT 4 = MJ090017 (cf. え),
*shi* HENTAIGANA LETTER SI VARIANT 2 = MJ090072 (cf. HIRAGANA LETTER SI し)
and *nu* HENTAIGANA LETTER NU VARIANT 2 = MJ090149 (cf. HIRAGANA LETTER NU ぬ):
their differences are solely that the 'brush' is removed from the paper on
a downward rather than a rightward flourish, reflecting vertical
handwriting. Ordinarily I would argue against including them, but since the
MoJ has recognised them as official variants they need to be included.

The decision to propose in most cases one codepoint for the hentaigana
derived from a single Chinese character is sensible, as also is the
decision to allow multiple codepoints in certain cases where manuscripts
use side-by-side significantly distinct forms derived from the same Chinese
character and with the same value. An example of the latter is HENTAIGANA
LETTER KA VARIANT 3 = MJ090025and KA VARIANT 4 = MJ090026, both pronounced
*ka* and both derived from the Chinese character 可, but which are routinely
both found in the same manuscript by the same hand as if they were separate
graphemes from the Heian to the Meiji periods.

POLYPHONY. Several hentaigana are truly polyphonous (e.g. the 子-derived
hentaigana = *ne* MJ090151 or MJ090059 *ko*, or the 馬-derived hentaigana =
*me* MJ090222 or *ma* MJ090205). In particular, those hentaigana derived
from 无 and associated with *n* (MJ090298, MJ090299) historically (also the
source of HIRAGANA LETTER N ん) are also used for *mu* (MJ090214, MJ090215)
and *mo* (MJ090224, MJ090223). Diachronically, *n* in native Japanese words
is usually derived from an earlier *mu*. Takada et al. includes a list of
10 kanji sources that this applies to in the proposed repertoire.
(Strictly, this affects 11 hentaigana, because the proposal has two forms
for 无-derived characters.) The proposal's solution is to assign different
identifiers, e.g. 子 = HENTAIGANA LETTER NE VARIANT 1 and HENTAIGANA LETTER
KO VARIANT 2, 馬 = HENTAIGANA LETTER ME VARIANT 3 and HENTAIGANA LETTER MA
VARIANT 7, and the two derived from 无 = HENTAIGANA LETTER N VARIANT 1, N
VARIANT 2, MU VARIANT 1, MU VARIANT 2, MO VARIANT 1 and MO VARIANT 2. This
means that there would be characters that are given more than one codepoint
and identifier but are formally and etymologically identical, adding 13
unnecessary repetitions to the character set. I would favour Yada's naming
system, where the polyphonous characters are given a single codepoint and
identifier, e.g. 子 = HENTAIGANA LETTER NE-KO, 馬 = HENTAIGANA ME-MA, and two
无-derived forms = HENTAIGANA LETTER N-MU-MO 1 and N-MU-MO 2.

STANDARD VARIATION: The suggestion that hentaigana be standard variation
characters means that in the absence of appropriate font support they would
be rendered as hiragana with the same value. (This appears to underlie the
decision to propose different codepoints and names for the polyphonous
hentaigana.) I do not support this. The two main uses of hentaigana are
academic and by the MoJ. Academics will only use hentaigana if they
specifically need them to be rendered as such rather than as hiragana, and
because hentaigana as proposed for inclusion in Unicode and hiragana that
are already encoded together constitute the same pre-1900 script
proofreading a text to spot incorrect renderings would be very difficult.
It would be easier for academics if lack of font support rendered
hentaigana simply as blanks. Similarly, MoJ name registration normally
involves recording the name both in registered spelling and in hiragana
transcription, so having hentaigana show up as blanks would not cause a
problem.
Received on Wed Dec 09 2015 - 10:52:11 CST

This archive was generated by hypermail 2.2.0 : Wed Dec 09 2015 - 10:52:12 CST