From: Jungshik Shin (email@example.com)
Date: Sat May 17 2003 - 21:48:51 EDT
On Sun, 18 May 2003, Philippe Verdy wrote:
Before going further, I want to make it clear that the following
discussion is only relevant to the scheme 1b in Mark's message sent
2003-05-16 16:20:55 UTC -0700. With the scheme 1a, all these complexities
and inflexibilities can be avoided, which is why I prefer 1a to 1b.
> > From: "Jungshik Shin" <firstname.lastname@example.org>
> > > > > S1 => 3301; 2251; 1101; TERM
> > > > > S2 => 3301; 2255; 1101; TERM
> > > > > S3 => 3301; 2251; 2231; 1101; TERM
> > >
> > > > So we already have S1 < S2 < S3 appropriately.
> > >
> > > I must be missing something very fundamental and obvious. Could you
> > > kindly explain to me why the above yield S1 < S2 < S3 instead of S1 <
> > > S3 < S2? 3301, 2251, 2255, 1101, 2231 are all primary weights. The first
> > > character have the same primary weight in S1,S2 and S3. However, in the
> > > second character, S2 has a larger primary weight (2255) than S3(2251)
> > > so that S3 < S2 instead of S2 < S3. Could you tell me where I went wrong
> > > in this reasoning?
> Oh sorry, you're right this time (but it's true that there was an error
> for the third primary weight of S3 which was 1231 instead of 2231 with
> your initial message (probably just a typo).
Yes, it's just a typo, but not mine but Mark's. Those weights were not
given by me but by him. I already corrected that typo in my reply to
his message yesterday.
> So you're right here, to have S1<S3<S2 then the following weights are enough:
> > S1 => 3301; 2251; 1101; TERM
> > S2 => 3301; 2255; 1101; TERM
> > S3 => 3301; 2251; 2231; 1101; TERM
> To have S1<S2<S3, you need to tailor the group of medial V+ letters so
> that they collate as a single unit with a custom weight for the composite
> sequence (for example the group 2251;2231 in S3 should be collated as
> a single 2299 weight):
That's what I've been saying from the very beginning of this
thread, but somehow I've been unsuccesful in getting it through to Mark.
Not only L+ but also V+ and T+ have to be contracted and assigned
interleaving weights. So, I began to wonder if I had missed something
very obvious about the collation.
> But is there a way to compute a weight for all L+ combinations in
> that case, or does it apply only to a known repertoire of L+ sequences
> that need a tailoring rule for special ordering ?
> Or is what you really
> need based on the length of L+ sequences, instead of just the individual
> weights of medial L vowels, so that longer L+ sequences are always sorted
> after shorter ones ?
This inflexibility (of having to know the repertoire a priori and to
assign them weights in such a way that they can collate as desired) is
exactly why I don't like the scheme 1b in Mark's message. With the scheme
1a in Mark's message (which I also suggested in my messages from the
beginning and which Mark also prefers), we don't have to worry about this
problem at all. All the assigned precomposed/cluster L's, V's and T's
will be expanded  to sequences of collation elements. For instance,
U+111A (Hangul Choseong RieulPieup) will be expanded to a sequence
of collation elements made of the collation element for U+1105(Hangul
Choseong Rieul) and that for U+1107(Hangul Choseong Pieup). This way,
cluster Jamos, whether they're assinged codepoints or not, will be
treated exactly the same way.
 my original sugegstion was to decompose them into sequences of basic
Jamos at the preprocessing stage. Mark pointed out taht preprocessing
on string-level is deadly for performance, (I'm wondering, then, what
difference there is between the normalization of S1.1 in UTS #10 and
what I suggested aside from the fact that mine is not a part of Unicode
NFD). Anyway, the result is identical.
This archive was generated by hypermail 2.1.5 : Sat May 17 2003 - 22:23:49 EDT