Re: Proposed Update of UTS #10: Unicode Collation Algorithm

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri May 16 2003 - 22:36:44 EDT

  • Next message: Jungshik Shin: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"

    > To take the same example as I took in my previous email, I don't see
    > how S1,S2 and S3 could be sorted S1 < S2 < S3 (instead of S1 < S3 <
    S2)
    > without contracting the sequence of 'U+1169 (ㅗ:HANGUL JUNGSEONG O)
    > U+1163 (ㅑ:HANGUL JUNGSEONG YA)'?
    >
    > S1: U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) U+1169 (ㅗ:HANGUL JUNGSEONG
    O)
    > U+11A8 (ㄱ:HANGUL JONGSEONG KIYEOK)
    > S2: U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) U+116A (ㅘ:HANGUL JUNGSEONG
    WA)
    > U+11A8 (ㄱ:HANGUL JONGSEONG KIYEOK)
    > S3: U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) U+1169 (ㅗ:HANGUL JUNGSEONG
    O)
    > U+1163 (ㅑ:HANGUL JUNGSEONG YA) U+11A8 (ㄱ:HANGUL JONGSEONG
    KIYEOK)
    >
    > where the primary weights of each Jamo are given as following,
    >
    > U+1100 (ᄀ:HANGUL CHOSEONG KIYEOK) : 301
    > U+1161 (ㅏ:HANGUL JUNGSEONG A) : 201
    > U+1163 (ㅑ:HANGUL JUNGSEONG YA) : 231
    > U+1169 (ㅗ:HANGUL JUNGSEONG O) : 251
    > U+116A (ㅘ:HANGUL JUNGSEONG WA) : 255
    > U+11A8 (ㄱ:HANGUL JONGSEONG KIYEOK) : 101

    Remember, the weights have to be changed so that: T < V < L, so I'll
    add 3000 to Ls, 2000 to Vs and 1000 to Ts

    S1 => 3301; 2251; 1101; TERM
    S2 => 3301; 2255; 1101; TERM
    S3 => 3301; 2251; 1231; 1101; TERM

    >
    >
    > > > enumerating all equivalent sequences but just giving primary
    weights
    > > > to only 'basic' Jamos and requiring a preprocessing in which
    cluster
    > > > jamos are decomposed into sequences of basic Jamos.
    > >
    > > Preprocessing (on a string basis) is *deadly* for performance. It
    is
    > > also not necessary. The weight tables already allow characters to
    > > expand, that is what would be done in this case: it is just 1a
    above.
    >
    > I see your point. I didn't pay attention to expansion.
    >
    > Jungshik
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 23:11:51 EDT