UAX 15 hangul composition

From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Tue Aug 03 2004 - 06:47:36 CDT

Next message: Mustafa Jabbar: "Re: International CALIBER-2005: Call for Papers"

Previous message: Peter Kirk: "Re: Holam (was Errors in TUS Figure 15.2?)"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: UAX 15 hangul composition"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: UAX 15 hangul composition"
Reply: Doug Ewell: "Re: UAX 15 hangul composition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Don't know if this has been asked/reported before, but is the example code
for hangul composition in UAX 15 correct?

The code is:
     public static String composeHangul(String source) {
         int len = source.length();
         if (len == 0) return "";
         StringBuffer result = new StringBuffer();
         char last = source.charAt(0); // copy first char
         result.append(last);

for (int i = 1; i < len; ++i) {
char ch = source.charAt(i);

// 1. check to see if two current characters are L and V

             int LIndex = last - LBase;
             if (0 <= LIndex && LIndex < LCount) {
                 int VIndex = ch - VBase;
                 if (0 <= VIndex && VIndex < VCount) {

// make syllable of form LV

                     last = (char)(SBase + (LIndex * VCount + VIndex) * TCount);
                     result.setCharAt(result.length()-1, last); // reset last
                     continue; // discard ch
                 }
             }

// 2. check to see if two current characters are LV and T

             int SIndex = last - SBase;
             if (0 <= SIndex && SIndex < SCount && (SIndex % TCount) == 0) {
                 int TIndex = ch - TBase;
                 if (0 <= TIndex && TIndex <= TCount) {

// make syllable of form LVT

                     last += TIndex;
                     result.setCharAt(result.length()-1, last); // reset last
                     continue; // discard ch
                 }
             }

// if neither case was true, just add the character

             last = ch;
             result.append(ch);
         }
         return result.toString();
     }

Suppose I feed it 0xAC00 0x11C3. 0xAC00 is an LV.
This will do step 2:

SIndex = 0xAC00 - 0xAC00 = 0
TIndex = 0x11C3 - 0x11A7 = 28

Which causes the "(0 <= TIndex && TIndex <= TCount)" to be true.
And the resulting output is 0xAC00 + 28 = 0xAC1C which is not
an LVT but an LV syllable!

The TIndex <= TCount should be TIndex < TCount I think. IMO the
example would be more clear if the Hangul_Syllable_Type property
would be used.

A somewhat related question. I know next to nothing about Hangul
[de]composition so forgive me for asking silly questions. In the
UnicodeData.txt file there are much more than the 19 L, 21 V, and
28 L jamos. Are the other jamos not use to compose syllables, or
does the syllable block represent an incomplete set of compatibility
characters? What's is it?

Theo

Next message: Mustafa Jabbar: "Re: International CALIBER-2005: Call for Papers"
Previous message: Peter Kirk: "Re: Holam (was Errors in TUS Figure 15.2?)"
Next in thread: Marcin 'Qrczak' Kowalczyk: "Re: UAX 15 hangul composition"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: UAX 15 hangul composition"
Reply: Doug Ewell: "Re: UAX 15 hangul composition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 03 2004 - 06:49:11 CDT