I'd like to ask what the rationale is for including Korean kugyol as a
subset of CJK Ideographs in Unicode, while Chinese bopomofo and Japanese
katakana are treated as distinct from their CJK Ideograph origins and
Some Korean kugyol are marked in the UNIHAN.TXT file in the kDefinition
field by the word "kwukyel"--which is not a comprehensive list--some
others may be seen on pp. 127-128 of Ho-Min Sohn's _The Korean Language_
(Cambridge: Cambridge University Press, 1999), in chapter six, "Writing
systems" (pp. 121-150).
persephone:~> grep kwukyel unihan.txt
U+4E06 kDefinition kwukyel
U+4E37 kDefinition kwukyel
U+4E4A kDefinition kwukyel
U+4E5B kDefinition kwukyel
U+4E65 kDefinition kwukyel
U+4E87 kDefinition kwukyel hammer
U+4EAA kDefinition kwukyel
U+4EBD kDefinition kwukyel
U+4ED2 kDefinition kwukyel
U+516F kDefinition kwukyel
U+536A kDefinition kwukyel
U+53BC kDefinition kwukyel
U+58ED kDefinition kwukyel
U+6729 kDefinition kwukyel: rank, grade; wait; equal; "etc."
U+6730 kDefinition kwukyel
U+7F56 kDefinition kwukyel
Korean kugyol are derived in a similar manner as Chinese bopomofo and
Japanese katakana (and even some "simplified Chinese" characters), by
taking CJK Ideographs in part or whole, and perhaps modifying some strokes.
e.g., U+3105 BOPOMOFO LETTER B is a portion of bao1 U+5305 (and identical
to U+52F9); U+310C BOPOMOFO LETTER L is li4 U+529B with an extra "tick"
added, as is U+3116 BOPOMOFO LETTER R, which is ri4 U+65E5 with the
center stroke stylized; U+30A2 KATAKANA LETTER A is a portion (and
modification) of U+963F; U+30BF KATAKANA LETTER TA is a portion of
U+591A; U+30CC KATAKANA LETTER NU is a portion of U+5974.
A hypothetical "KUGYOL LETTER YA" would look like U+4E5B (listed above),
which is a portion of U+4E5F; a "KUGYOL LETTER HO" would look like U+4E37
(listed above), which is a portion (and modification) of U+7232; a
"KUGYOL LETTER NUN" would look like U+536A (listed above), which is a
portion (and modification) of U+96B1.
I can think of a few possible reasons why Chinese bopomofo and Japanese
katakana have been treated as distinct from CJK Ideographs, such as 1)
distinguished in source legacy CJK character sets: 2) not included in
Chinese and Japanese character dictionaries; 3) technically capable of
being used in the absence of CJK Ideographs as a complete script; 4) used
solely for phonetic value; 5) in widespread contemporary use, so
regular people care for the distincton.
#1 would seem to explain why Bopomofo and Katakana are distinguished in
Unicode, but not why Kugyol are not--has there ever been a legacy CJK
character set that included Kugyol, whether unified with CJK Ideographs
or not? #2 may be an argument for treating Kugyol as a subset of CJK
Ideographs--are kugyol considered "characters" in Korean dictionaries? #3
seems to also explain why Bopomofo and Katakana are distinguished as
separate scripts, but not why Kugyol were treated as a subset of CJK
Ideographs. #4 also would explain why Bopomofo and Katakana are
distinguished from CJK Ideographs, since Kugyol have non-phonetic
functions, but this doesn't explain why they aren't treated separately
like the Kanbun block, which are no more than minaturized CJK Ideographs
used for non-phonetic annotation (and which have no legacy source
separation rationale, either--as stated on p. 267 of TUS 3.0). #5 would
also seem to explain so, since Kugyol are not used today.
So, what exactly is Unicode's reason?
Incidently, if #4 is a valid reason, would it be possible for
Unicode to have hypothetical "MANYOOGANA LETTER KOORUI KI", "MANYOOGANA
LETTER KOORUI GI", "MANYOOGANA LETTER OTURUI KI", etc. that look exactly
like CJK Ideographs U+652F, U+4F0E, and U+5947, respectively, but are used
solely for phonetic value--just like katakana? Or do scholars of Old
Japanese not care for the distinction? (or are not politically
motivated to ask for it?)--see my #5 above. See p. 1529 of OONO Susumu,
et al., eds., _Iwanami Kogo Jiten_ (Tokyo: Iwanami, 1974) for more
examples (there are a few hundred).
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT