Document: L2/01-310

Title: Khmer Issues on the Horizon

Authors: Rick McGowan and Ken Whistler



There appears to be some perceived difficulty with the Khmer encoding, from 

the point of view of some factions within Cambodia.  We have had 

reports to this effect.  The trouble appears to be twofold:


  (1) looming political trouble due to a bad feeling among some that

      Cambodian interests where neglected or ignored during the development

      of the Khmer encoding in 10646/Unicode.

  (2) mis-understanding of and/or disagreement with the current model for

      encoding Khmer. 


Below are some facts and rumors, as we understand them.


* RUMOR: The government of Cambodia has apparently contacted ISO JTC1 

directly with complaints regard to the encoding.  They may call for 

rescinding the 10646 encoding at that level.


* FACT: During the initial development of Khmer encoding, Glenn Adams 

cautioned careful procedure, and at one time planned a trip to Cambodia, 

which never materialized.


* FACT: In early contact with Norbert Klein (who is _IN_ Cambodia), circa 

1997, he claims that he offered to put Unicode people into contact with 

government officials, but now reports that from the Unicode side nobody 

followed through. For our part, we wonder why, if he was on the ground there, 

he did not simply take action to involve more people. He was on the Khmer 

mail list at Unicode, and was involved in all the discussions.


* FACT: There is a Cambodian government project underway to define a 

national standard character encoding. RUMOR: We have heard that this committee 

desire "one codepoint, one character" approach, and it seems possible that 

they do not understand the current model, or understand it but disagree

with it sufficiently to continue with standardizing a different approach.


* RUMOR: Microsoft apparently has a working model of the current encoding



* RUMOR: There apparently exists a Japanese funded philological project 

which in one report is urging a different sort of encoding; and in another 

report is awaiting a national standard to be handed down. In neither case is 

Unicode being considered, apparently.  (And it is also apparent that they may 

not understand the current model.)


* FACT: Rick twice sent e-mail to Sorasak Pan urging contact between his 

committee and Unicode, and has received no response to date.  Address:


      Sorasak Pan

      Under Secretary of State

      Royal Government of Cambodia

      Russian Federation Blvd.

      Tel: (855 23) 426 054

      Fax: (855 23 218 673



* FACT: Relevant experts outside of Cambodia who were involved 

in the encoding are: Maurice Bauhahn and Paul Nelson. Inside Cambodia

is Norbert Klein.




The basic technical issue, as best I understand it,

boils down to the virama model versus the encoding of subscript consonants

(and vowels). The current Unicode model for the Khmer script assumes the

virama model, as for many other Brahmi-derived scripts, including

Myanmar. However, as was apparent in the Japanese NB comments on Amendment

25, there were experts at the time who disagreed with that approach and

favored an explicit subscript encoding for Khmer. While the virama model

was discussed in Cambodia and apparently had some support from some

technologists there, there appear to have been significant political

shifts, resulting now in significant opposition to that approach, apparently

at a ministerial level in the government. I expect that the basic nature

of any new proposals that emerge from Cambodia and/or Japan will be to

encode explicit subscripts for Khmer. And it is quite likely that any

such proposal will, given its nature, once again be of the nature of

"remove the current encoding and replace it with the xxx national standard

for yyy," rather than an attempt to make delta additions to the current



Note that the Khmer script is basically used only in Cambodia, so

that there is a prima facie case for the government or relevant

ministry to be the compelling stakeholder in this case. This is

a much easier case to make for a local script like this than for

a multinational script like Latin, Cyrillic, or Han. It will almost

certainly be argued this way at the JTC1 level, if it comes to that.


I also suspect that all parties here have some well-intentioned,

strong arguments about productivity in information technology in

mind -- particularly for keyboard entry. Different opinions about

what is "right" and more efficient for shifting people over from

existing typewriter keyboarding practice to computerized text entry

may be part of what is driving people to different positions regarding

what is the best encoding for Khmer.


We are bringing these issues to the attention of the UTC, since we

suspect that the Khmer encoding will be raised at the upcoming

Singapore meeting of WG2, and that in that context a very forceful

case will be made to change the Khmer encoding in 10646 (with obvious

implications for the Unicode Standard).