RE: Jamo_Short_Name

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Wed, 2 Jan 2013 20:20:01 +0000

André Schappo asked:

> Been looking at http://www.unicode.org/Public/UNIDATA/Jamo.txt
>
> There appears to be 2 different romanizations at play in the file? One for the
> short name and another for the full name
> eg 1100; G # HANGUL CHOSEONG KIYEOK
>
> I have searched unicode.org but cannot find appropriate documentation. Can
> anyone point me to definitive documentation?

Yes, but you're not gonna like the answer. ;-)

This situation for Hangul romanizations derives from the extensive haggling over Hangul syllable encoding which extended through 1992 (the crucial time for the Unicode/10626 "merger" process), and then continued in 1995, when Amd 5 to 10646 resulted in the re-encoding of the Hangul syllables into the U+AC00 block 11,172 syllable we currently have in the standard.

Unicode 1.0 had no conjoining jamo, but essentially just encoded the entirety of what was then known as KS C 5601 (now KS X 1001-1992) as compatibility characters. The names all used the "G" Romanization conventions. So U+3131 was HANGUL LETTER GIYEOG. The KS C 5601 compatibility Hangul syllable blocks (2350 of them) were encoded in the range U+3400..3D2D. The standard was silent about the naming conventions for those syllables, because it neither listed them explicitly, nor gave a rule for their names. However, from the naming conventions for the *circled* Hangul syllable characters, it is clear what the intent was. The syllables, if they had been spelled out, would all have used the "G" Romanization conventions. So U+3400, in principle, was "HANGUL GA", U+3401 "HANGUL GAG", etc., through the set. So, to summarize, for Unicode 1.0, the situation was:

U+3131 HANGUL LETTER GIYEOG
U+3132 HANGUL LETTER SSANG GIYEOG
...
U+3400 HANGUL GA
U+3401 HANGUL GAG
...
U+3D2D HANGUL HING

All that was completely revised during what turned into seat-of-the-pants, late-night, up-against-the-deadline negotiations (where have we encountered that recently?) during the July, 1992 WG2 meeting in Seoul. The "appropriate documentation" is all contained in the WG2 document register, but it is hard to find nowadays, because 1992 was long before WG2 maintained its document register online. For the record, here are the relevant documents:

N764 Minutes of Unicode Korean Subcommittee meeting, 01-Oct, 1991
N828 Comments of Republic of Korea on ISO/DIS 10646-1.2(1992) [1992.06.28]
N840 Proposal for disposition of Korean Comments to DIS 1.2 [1992.06.27]
N848 Modified Korean Position [1992.07.02]
N852 2nd Proposal for disposition of Korean Comments to DIS 1.2 [1992.07.02]
N860 Details of Korean Jamo Combining Rules [1992.07.02]
N861 China's Position on Hangul in UCS [1992.07.02]
N864 Modification of Korean Position (2) [1992.07.03]
N868 Revisions to Korean [1992.07.03]

The net outcome of this was to introduce the U+1100 block of conjoining jamo letters, extend the block of Hangul syllables (U+3D2E..U+4DFF), and change the names of everything. All names were changed to use the "K" Romanization conventions. The detailed results can be found in ISO/IEC 10646-1:1993 or in Unicode 1.1, but as both of those are *also* difficult to obtain these days, here is a summary of the outcome:

U+1100 HANGUL CHOSEONG KIYEOK
U+1101 HANGUL CHOSEONG SSANGKIYEOK
...
U+3131 HANGUL LETTER KIYEOK
U+3132 HANGUL LETTER SSANGKIYEOK
...
U+3400 HANGUL SYLLABLE KIYEOK A
U+3401 HANGUL SYLLABLE KIYEOK A KIYEOK
...
U+3D2D HANGUL SYLLABLE HIEUH I IEUNG
U+3D2E HANGUL SYLLABLE KIYEOK A SSANGKIYEOK
...
U+4DFF HANGUL SYLLABLE MIEUM WEO RIEUL-THIEUTH

In 1995, there was another complete revolution in the Hangul encoding, as the pressure was on to include the *entire* set of 11,172 syllables, in the same predictable order we now see in the standard, rather than as a compatibility block from KS C 5601, filled out by extensions. The relevant documents in WG2 were:

N1158 Korean National Position for adding Hangul characters [1995.02.27]
N1170 Canadian Position on Korean Proposal in N 1158 for adding Hangul characters [1995.03.10]
N1198 Working Draft for a proposed draft amendment to ISO/IEC 10646-1:1993 [1995.04.05]
N1199 Background on Korean Coding [1995.04.05]
N1209 Proposed text of pDAM 5 to 10646, Hangul Character Collection
N1265 Report on Letter Ballot of PDAM5 to ISO/IEC 10646-1 (Hangul): Proposed Disposition of Comments from National Bodies [1995.09.26]
N1285 Hangul Syllable Character Name Generation Algorithm

This amendment resulted in removal of all the Hangul syllables in the range U+3400..U+4DFF, and replacement by the current block of Hangul syllables at U+AC00..U+D7AF. A key aspect of this change was that the set of 11,172 was an algorithmic ordering of the syllables. There was an argument at the time, but in the end, it was decided that the *names* of the characters shouldn't be maintained as a hand-edited list of 11,172 names, but could be defined algorithmically. N1285 defined that algorithm. It also went back to the principle that the names for syllables (as in other cases in the standard) would better be handled as romanizations of the pronunciation of the syllables, rather than by spelling out sequences of letters. So we ended up with the jamo short names (printed in the U+1100 block in the case of 10646 back then), and names for syllables very reminiscent of Unicode 1.0 names. (Note this did *not* change anything about the then existing, standard names for the U+1100 block of conjoining jamos or the U
+3130 block of compatibility jamos.) In summary:

U+AC00 HANGUL SYLLABLE GA
U+AC01 HANGUL SYLLABLE GAG
...
U+D7A3 HANGUL SYLLABLE HIH

So that is how we ended up with one set of romanizations for the jamo characters and another for the Hangul syllables. As for many Unicode "just so" stories, there isn't a convenient documentation of this in the standard itself, but if you want, you can bookmark this summary once it appears in the Unicode mail archives. ;-)

--Ken
Received on Wed Jan 02 2013 - 14:23:17 CST

This archive was generated by hypermail 2.2.0 : Wed Jan 02 2013 - 14:23:17 CST