Understanding the Hangul mapping tables

From: Tim Greenwood (greenwood@openmarket.com)
Date: Wed Dec 03 1997 - 14:00:54 EST


On the Unicode v2.0 CD and on the ftp site under \UNIX\MAPPINGS\EASTASIA\KSC
there are files Ksc5601.txt and Hangul.txt

The Ksc5601 files starts -

#
# Name: Unified Hangeul(KSC5601-1992) to Unicode table
# Unicode version: 2.0
# Table version: 1.0
# Table format: Format A
# Date: 07/24/95
# Authors: Lori Hoerth <lorih@microsoft.com>
# K.D.Chang <a-kchang@microsoft.com>
# General notes: none
#
# Format: Three tab-separated columns
# Column #1 is the Unified Hangeul code (in hex)
# Column #2 is the Unicode (in hex as 0xXXXX)
# Column #3 is the Unicode name (follows a comment sign, '#')
#
# The entries are in Unified Hangeul order
#
'

The Hangul file starts -
Korean Hangul Encoding Conversion Table
---------------------------------------------------------------------------
Date : Oct 04, 1995
Author : K.D.Chang <a-kchang@microsoft.com>
         In Sook Choi <ischoi@microsoft.com>
         Jung Ho Kim <junghok@microsoft.com>
---------------------------------------------------------------------------
Column 2 : Wansung (KSC 5601-1987)
               LeadByte = 0xA1 - 0xFE with TrailByte = 0xA1 - 0xFE
Column 3 : Unified Hangul
           include 2,350 characters same as Wansung
             LeadByte = 0xA1 - 0xFE with TrailByte = 0xA1 - 0xFE
           plus 8,822 characters :
             LeadByte = 0x81 - 0xA0 with TrialByte = 0x41 - 0x5A, 0x61 -
0x7A, 0x81 - 0xFE
             LeadByte = 0xA1 - 0xC6 with TrialByte = 0x41 - 0x5A, 0x61 -
0x7A, 0x81 - 0xA0
Column 4 : Johab (KSC 5601-1992)
Column 5 : Unicode 1.0
            2,350 : Hangul U+3400 - U+3D3D
Column 6 : Unicode 1.1
            2,350 : Hangul U+3400 - U+3D3D
            1,930 : Hangul Supplementry-A U+3D2E - U+44B7
            2,376 : Hangul Supplementry-B U+44BE - U+4dFF
Column 7 : Unicode 2.0
           11,172 : Hangul U+AC00 - U+D7A3

----
(End of quote)

Column 3 from the Hangul file matches column 1 from the Ksc5601 file - this is reasonable since they are both labeled 'Unified Hangul'. How does this relate to the Column 4 Johab, which is labeled as KSC5601-1992 ? Ken Lunde's book describes the byte range for Ksc5601-1992 as A1-FE for both bytes. The range in the Unified Hangul tables is 81-FD for byte 1 and 41-FF for byte 2. How does it all fit together? What are the actual codes that a Korean browser will emit ?

---------------------------------------------------------------- Tim Greenwood Open Market Inc. 617-949-7166 Cambridge, MA



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT