Document: L2/01-310

Title: Khmer Issues on the Horizon

Authors: Rick McGowan and Ken Whistler

There appears to be some perceived difficulty with the Khmer encoding, from

the point of view of some factions within Cambodia. We have had

reports to this effect. The trouble appears to be twofold:

(1) looming political trouble due to a bad feeling among some that

Cambodian interests where neglected or ignored during the development

of the Khmer encoding in 10646/Unicode.

(2) mis-understanding of and/or disagreement with the current model for

encoding Khmer.

Below are some facts and rumors, as we understand them.

* RUMOR: The government of Cambodia has apparently contacted ISO JTC1

directly with complaints regard to the encoding. They may call for

rescinding the 10646 encoding at that level.

* FACT: During the initial development of Khmer encoding, Glenn Adams

cautioned careful procedure, and at one time planned a trip to Cambodia,

which never materialized.

* FACT: In early contact with Norbert Klein (who is _IN_ Cambodia), circa

1997, he claims that he offered to put Unicode people into contact with

government officials, but now reports that from the Unicode side nobody

followed through. For our part, we wonder why, if he was on the ground there,

he did not simply take action to involve more people. He was on the Khmer

mail list at Unicode, and was involved in all the discussions.

* FACT: There is a Cambodian government project underway to define a

national standard character encoding. RUMOR: We have heard that this committee

desire "one codepoint, one character" approach, and it seems possible that

they do not understand the current model, or understand it but disagree

with it sufficiently to continue with standardizing a different approach.

* RUMOR: Microsoft apparently has a working model of the current encoding

in-house.

* RUMOR: There apparently exists a Japanese funded philological project

which in one report is urging a different sort of encoding; and in another

report is awaiting a national standard to be handed down. In neither case is

Unicode being considered, apparently. (And it is also apparent that they may

not understand the current model.)

* FACT: Rick twice sent e-mail to Sorasak Pan urging contact between his

committee and Unicode, and has received no response to date. Address:

Sorasak Pan

Under Secretary of State

Royal Government of Cambodia

Russian Federation Blvd.

Tel: (855 23) 426 054

Fax: (855 23 218 673

e-mail: [email protected]

* FACT: Relevant experts outside of Cambodia who were involved

in the encoding are: Maurice Bauhahn and Paul Nelson. Inside Cambodia

is Norbert Klein.

* ANALYSIS (Ken):

The basic technical issue, as best I understand it,

boils down to the virama model versus the encoding of subscript consonants

(and vowels). The current Unicode model for the Khmer script assumes the

virama model, as for many other Brahmi-derived scripts, including

Myanmar. However, as was apparent in the Japanese NB comments on Amendment

25, there were experts at the time who disagreed with that approach and

favored an explicit subscript encoding for Khmer. While the virama model

was discussed in Cambodia and apparently had some support from some

technologists there, there appear to have been significant political

shifts, resulting now in significant opposition to that approach, apparently

at a ministerial level in the government. I expect that the basic nature

of any new proposals that emerge from Cambodia and/or Japan will be to

encode explicit subscripts for Khmer. And it is quite likely that any

such proposal will, given its nature, once again be of the nature of

"remove the current encoding and replace it with the xxx national standard

for yyy," rather than an attempt to make delta additions to the current

encoding.

Note that the Khmer script is basically used only in Cambodia, so

that there is a prima facie case for the government or relevant

ministry to be the compelling stakeholder in this case. This is

a much easier case to make for a local script like this than for

a multinational script like Latin, Cyrillic, or Han. It will almost

certainly be argued this way at the JTC1 level, if it comes to that.

I also suspect that all parties here have some well-intentioned,

strong arguments about productivity in information technology in

mind -- particularly for keyboard entry. Different opinions about

what is "right" and more efficient for shifting people over from

existing typewriter keyboarding practice to computerized text entry

may be part of what is driving people to different positions regarding

what is the best encoding for Khmer.

We are bringing these issues to the attention of the UTC, since we

suspect that the Khmer encoding will be raised at the upcoming

Singapore meeting of WG2, and that in that context a very forceful

case will be made to change the Khmer encoding in 10646 (with obvious

implications for the Unicode Standard).