Unicode Technical Committee Meeting #62
                  (Toronto, Canada -- Sept.  30, 1994)
                  Discussion of Korean Hangul Proposal

Composite of notes taken by: Joan Aliprand, Steve Greenfield, Tim
Greenwood, John Jenkins
Composite prepared by; Joan Aliprand

===========================================================================
Attendees:

Corporate Members:
John McConnell, Apple           jmcc@apple.com
Tim Greenwood, Digital          greenwood@r2me2.dec.com
Mike Ksar, HP                   ksar@hpcea.ce.hp.com
Don Carroll, HP                 don_carroll@hpboi1.desk.hp.com
John Gioia, IBM                 gioia@vnet.ibm.com
Fred Bealle, IBM                fbealle@vnet.ibm.com
Alexis Cheng, IBM               alexis@vnet.ibm.com
Hossein Kushki, IBM             kushki@vnet.ibm.com
Marty Marchyshyn, IBM           martian@vnet.ibm.com
Lisa Moore, IBM                 lisam@vnet.ibm.com
Uma Umamaheswaran, IBM          umavs@torolab6.vnet.ibm.com
Ed Batutis, Lotus               ebatutis@lotus.com
Sungi Hong , Microsoft          sghong@microsoft.com
Young Lim, Microsoft            youngl@microsoft.com
Lloyd Honomichl, Novell         lloyd_honomichl@novell.com
Joan Aliprand, RLG              br.jma@rlg.stanford.edu
John Jenkins, Taligent          john_jenkins@taligent.com
Kelsey Bruso, Unisys            bruso@unirsvl.rsvl.unisys.com

Associate Members:
John Bennett, Sybase            jrb@sybase.com

Individual Members:
Tex Texin, Progress Software    texin@bedford.progress.com

Officers:
Joe Becker, Xerox               becker.osbu_north@xerox.com

Liaisons:
T.J. Kang, WG2-Korea Liaison

Unicode Office Manager:
Steve Greenfield                unicode-inc@unicode.org

Guest:
Dirk Vermeulen, CASEC
===========================================================================


Alternatives for Encoding Modern Hangul
---------------------------------------

Jenkins listed the alternative solutions to the problem of encoding all
modern hangul.  This list was compiled at Taligent, by Jenkins, Mark Davis,
and David Goldsmith.  No additional alternatives were proposed at the UTC
meeting.

The alternatives are designated "a" through "f".  Alternative "f" (current
status) has two options (designated "1" and "2").

Although this list was presented in the middle of the UTC's discussion of
the issue, it is put here at the beginning because it is a key element.

a) add the additional 4,516 hangul to the BMP

b) Move all 11,172 hangul to another plane

c) Put the additional 4,516 hangul into another plane

d) Copy all 11,172 hangul to another plane

e) Permanently shrink the user zone in the Unicode standard and put the
   additional 4,516 hangul into the former user space.

f) Do nothing (i.e., maintain current status)

   1) Use conjoining jamos

   2) Encode the additional 4,516 hangul in the user zone


Presentation by T.J. Kang on Unicode in Korea
---------------------------------------------
Kang outlined the events that have occured since June 1992 meeting of
ISO/JTC1/SC2/WG2 in Seoul, Korea.

Revision of KSC 5601 was completed after the 1992 WG2 meeting, and
published as KSC 5601-1992.  This adds the 11,732 "Johab" precomposed
syllables.

Scope of the principal standards that include modern hangul:

KSC 5601-1987 has 2,350 precomposed modern hangul
KSC 5657 (a standard that no one implemented) has 1,800 modern and 2,000+
ancient hangul
KSC 5601-1992 introduced a new coding scheme for all 11,172 modern hangul


KSC 5601 - 1992         11,172
ISO 10646                2,350  (from KSC 5601)
                         1,800+ (part of KSC 5657)
                         2,000+ old hangul of KSC 5657 were
                         deleted and replaced by modern
                         hangul

The hangul from KSC 5657 were selected on the basis of frequency.  The set
of 2,000+ modern hangul which replaced the old hangul is in alphabetical
order (not by frequency of occurrence).  The remaining 4,516 hangul from
KSC 5601 - 1992 take up three-quarters of the private use zone.

Ministry of Commerce and Industry set up a committee to study the future
character set needs of Korea.  The Committee recommended (in its report
published in November 1993) that Korean should go its own way, and not use
the Unicode standard or ISO 10646.  The modern part is finished; the
Committee is now studying ancient hangul.

Korea's WG2 participants were initially excluded from the Committee's
deliberations, but were included after the Committee's report was
published.  The general view was that the Korean delegates at the WG2
meeting in Korea had let the nation down by not getting full hangul set
into ISO 10646.  There is enough flux within Korea that ISO 10646/Unicode
can possibly be set aside and ignored.

MS Windows is a major platform for most computer users in
Korea.  Currently, it supports only the earlier version of
KSC 5601.  Some Korean companies are doing their own
modifications using the full 11,000 hangul (johab) of KSC
5601-1992.  Angin (spelling?) has all johab and some ancient
hangul; the company uses DOS and wrote its own routines for
its Windows version.  HanSoft is also doing the same thing,
but is using a slightly different coding scheme.  Standards
people in Korea are concerned about the proliferation of
coding schemes.
|Windows is growing in popularity, and people are starting to splice johap
|support onto it
|Proliferation of de facto solutions
#The dominant Korean word processing software (between 65-85% of
#the population) does its own character handling on DOS and supports all 11,172
#characters. Now porting to Windows - not using Windows text API for text
#display. Another company is writing a driver supporting all 11,732 characters
#but with a slightly different extensions to KSC5601-1992 c (in regards to old
#Hangul and Chinese character support).

The Korean national character code committee is wondering
about proposing the addition of a complete set of hangul to
ISO again.  The arguments for encoding hangul are:
* economy of storage (one code, rather than a number of
codes for conjoining jamos); and,
* backwards compatibility (MS is interested for this
reason).

Koreans have not seen an implementation of Unicode/ISO
10646.  MS is not sure whether it will have conjoining jamos
in NT.

The Korean national character code committee is responsible
for reviewing national standards as well as WG2
participation.  There is a difference of opinion in the
Committee on the development of national Korean character
sets versus adoption of an international standard (i.e., a
national version of ISO 10646).
#Still some disputes in the Korean
#standards bodies about support or not for 10646.

T.J. Kang requested support from the Consortium if Korea
proposes addition of hangul characters to the BMP.
|Korea's ISO group is hoping Unicode will go in with them on this issue
#Korean standards body does not
#want to propose this to WG2 with Unicode support.

Becker: How stable is the 11,000 hangul set?
A (Kang with Hong): It is not an open set.  The hangul
repertoire is also true for North Korea.

Q. Is ancient hangul a separate issue?
A. Character coding is a passionate issue in Korea.  There
have been features in the newspapers and on TV.  Scholars
want old hangul, as well as Chinese (i.e., hanja?).  The
repertoire of 11,000 hangul meets only modern needs.  The
need for old hangul is a minority opinion on the character
set committee.  Because of the number and variety of old
hangul, conjoining jamos may be used to encode them.

Greenwood: A Korean colleague said that johab were in the
KSC 5601-1992 standard as an appendix, and were designated
as for internal use only and not for interchange.
A. This statement reflects a compromise.  Government
computers still use the earlier version of KSC 5601, and the
government wanted to enforce its use for communication,
Small scale LAN users, on the other hand, are using the new
standard.

Q. Use of private user space?
A. An out for the Koreans.  Later, companies and Korean
delegates felt betrayed.

Is the inclusion of the missing hangul in the BMP too good
to be true?

Ksar (in his ISO capacity as Convenor of WG2) pointed out
that the Korean delegates to WG2 had not said anything about
this in over three years, and have not asked for this to be
put on the agenda of WG2.
#Mike criticized Korea for being quiet about this for the past 2 years and
#suddenly spring this on us.
#

Kang: Koreans feel that they have nothing to lose by asking
the UTC for support.

Presentation by S.G. Hong
-------------------------

The proposal is from Microsoft Corporation as a member
company.

Unless we include the 5,600+ characters in precomposed form,
MS will not be competitive in Korea.  MS wants to comply
with the Unicode standard, and also provide backwards
compatibility.

If a complete set of precomposed hangul are not included in
the main code space, MS would have to support the conjoining
jamos method.  This would mean that a printer driver (for
example) would have to convert from conjoining jamos to KSC
5601 -1992 encoding.

Jenkins: Has to do a conversion anyhow if its coming from
Unicode (i.e., if Unicode data is being directed to the
printer).

Hong: This may be a MS-specific problem.

Honomichl: Are there things we are doing that we are not
aware of and might cause us problems?  If so, it would be
good to know about them.

Hong: MS needs to have 1:1 mappings in driver APIs for
printers, screens, etc.  Looking for a trivial 1:1 mapping,
so can recompile to provide Unicode functionality.
#Why is using conjoining jamos a problem? To use 5601 a printer driver has to
#convert from Unicode to 5601 fonts. Conversion has to be made from either
#precomposed Unicode or conjoining. Answer - all the Microsoft API's are based on
#1-1 mapping from code page to Unicode.

Jenkins: Means that this (1:1 mapping need) is true for all
MS Unicode implementations.  The problem just surfaced in
Korea.

|Microsoft's problem
|Double API structure of Windows requires Unicode to be basically the same as
|the "native" code page for any language
|(That is, people would have to actually rewrite their printer drivers and
|what not so that they no longer make casual assumptions about the structure
|of the code set)

Becker: Two different pleas: Why cant the current status
get us somewhere?
A. Does not satisfy the belief of Koreans that the whole set
should be in there.
But why isnt it (current status) ok for MS?
#Joe  - why not just put it in the private use area? Ans. - we want to comply
#with the std and promote Unicode.
#

Hong: MS has been promoting the Unicode standard to ISVs.
But the Product Development Group found problems: they say
cannot ship product.  There is debate within MS whether to
use Unicode or to use DBCS for Korean.

Jenkins: Because MS is using private user space for Shift-
JIS, there is a conflict with Korean.  There is insufficient
room in the user zone for both Shift-JIS and the 4,516
hangul.  (Option f2 conflicts with MS implementation for
Shift-JIS.)

Jenkins presented the alternatives (see above), and said
that Taligent does not consider alternatives b, c or d to be
viable options.

Kang: Alternative d corresponds to a recommendation in the
study issued by the Korean character set committee.

Hong: MS would prefer Alternative e (because of conflict
between Shift-JIS and hangul).  This may be a viable option
in MS view.

Greenwood: Alternative e means a break with ISO 10646.  This
point was discussed.  The private use zone is not part of
the conformance clause of ISO 10646, so such a change would
not cause the Unicode standard to be non-conformant (at
least, legalistically).  However, Alternative e would mean a
non-compatible upgrade of the Unicode standard.

Hong: MS is looking for consensus on how to use user zone
for hangul.  Cannot agree on Shift-JIS use.

The UTC saw several problems with Alternative e:
* The Unicode user zone is made permanently smaller;
* Breaks with ISO 10646;
* Breaks with existing implementations.

Problem of gaiji characters in Japanese applications.

Ksar: Where do UTF-16 and USC-4 fit in this?  Has Korea
considered a Korean national variant of ISO 10646?

Hong asked for advice on strategies for the implementation
of hangul.  Initial opinion of the UTC was that the best
long-term strategies are Alternative f1 (conjoining jamos)
or Alternative b (move all 11,000 combined hangul to another
plane), but Ksar pointed out that Alternative b would mean
altering code point assignments of ISO 10646.

When the eventual need to convert data from Alternative f2
(some hangul in the user zone) is taken into account, UTC
opinion was that Alternatives c (4,516 combined hangul in
another plane) or d (all 11,000 hangul in another plane)
would be better.  These alternatives parallel current
combined single byte/double byte systems, with which Korea
has considerable experience.  The alternatives are also
consistent with UTF-16, which has been approved by the UTC.
Alternative c is preferable to Alternative d, as it avoids
duplicate encoding of the same hangul.

McConnell said he is against putting hangul in extra planes,
because they are encoding variants, not presentation
variants.  Does not like the prospect of multiple encodings
(i.e., conjoining jamos and precomposed hangul).

Jenkins: Prefers putting them on another plane to putting
them in the BMP.  The problem with Alternative f2
(additional hangul in user zone) is that users are not
*required* to use particular encoding values.  Because this
is user space, companies could declare their own arrangement
for encoding these characters.

Greenwood argued against changing the principles of the
Unicode standard just because of simplistic implementation
choices.

Bennett asked Hong: What happens when you receive data
encoded with conjoining jamos?

Hong: Are member companies willing to pursue Alternative a
(additional hangul in the BMP proper)?

The consensus of the UTC was that MS needs to present a
detailed cost/benefit analysis examination of all the
alternatives, to explain why MS prefers Alternative a.

Jenkins pointed out that, in the sort term, MS can do
Alternative f2, and would be compliant with the Unicode
standard.

The UTC needs solid reasons explaining why Alternatives c, d
and f are not going to happen in Korea.

Jenkins said that supporting Alternative a is telling us
you have to choose between Korea and Taiwan (which has
proposed a collection of additional ideographs).
Alternative a would also mean rejection of the German
proposal for additional characters.

Gioia: This brings in an additional factor: What is
happening at the ISO level?  The Koreans will have to
compete with everyone else for the unassigned code points of
the BMP.

Ksar: Koreans have not even made a proposal to ISO for
addition of the 4,000+ hangul to the BMP.

Greenwood: Korean national standards body should propose the
addition of the hangul to WG2.  It would be improper for the
Unicode Consortium to make such a proposal.  [No UTC member
disagreed with this statement.]
#For WG2 this should be proposed by the Korean national body.

Hong: Will bring alternatives to Product Group, ask them to
evaluate each, and say how hard it would be to do each.

Ksar: It is important for MS to continue to participate in
the Unicode Technical Committee and WG2.  Unicode, Inc. is a
liaison to WG2, and so any resolutions of the UTC have to be
presented to WG2 for a vote.  There are real advantages for
a company to be part of this process.

Korea is not the only country that wants to add characters
to the BMP.  The issues are described in ISO document N884
(copies were distributed at the UTC meeting).

MS needs to provide more information to convince the UTC
that one alternative is better than the others, and that
this alternative is better for other companies as well as
for MS.  But then you need to convince ISO.
#UTC needs much more information from Microsoft on why c, d and e are bad.
#To convince UTC to support some proposal we need solid figures why it is not
#just good for Microsoft to add these characters, but it is good for all the
#member companies.

There is no question that the UTC is firmly committed to
keeping ISO 10646 and the Unicode standard in synch.  Those
present at the meeting expressed vehement support for this.

Action item (Greenwood, Aliprand, Greenfield, others?):
Send copy of personal notes (draft Minutes in the case of
Greenfield) to S.G. Hong and T.J. Kang.
[DONE per this composite]

Action item (S.G. Hong):
Prepare a detailed cost/benefit analysis examination of all
the alternatives.  State which alternative is preferred by
MS (and why).  Give reasons why this alternative may be
better for other companies too.  This analysis to be
submitted to the UTC.
|The response
|We need from Microsoft
|More information to convince UTC that (a) is a better option than (d) or (f)
|Arguments to take to the rest of the world to convince the rest of the world


=END=