From: Peter Constable (petercon@microsoft.com)
Date: Sat Apr 24 2004 - 09:30:31 EDT
> Yes, problems do arise if there is more than one combining character
> between the base character and the VS and they are not in canonical
> order. But this is a marginal case which can be avoided by ensuring
that
> canonical order is always used.
If data is always encoded in canonical order, then having a VS within
the combining mark sequence wouldn't create any normalization problems,
that's true. But you well know that people do not want their Hebrew data
in canonical order. Even if they did, it couldn't be guaranteed.
There's a problem not only in cases of the form B M1 M2 VS, but also in
cases of the form B M1 VS M2. Of course, the issues are different. The
first may normalize to B M2 M1 VS; the second perhaps *ought* to
normalize to B M2 M1 VS, but that won't happen.
The only way to accommodate VSs within combining mark sequences would be
to define a set of VSs that pick up their canonical combining class from
the immediately preceding character. But since VSs can only be used in
explicitly-specified combinations, it might be less hassle to simply add
specific variation modifiers for specific combining marks (said
modifiers being combining characters with the same combining class); but
if you get to that point, you start to wonder whether adding a new
combining mark would meet the need just as well without architecting
entirely new encoding mechanisms.
> An alternative of course would be to define a special VS with the same
> combining class as the character it applies to, so that the two will
> always remain together. Thus there would potentially be the need for a
> considerable set of VSs. But I don't think this is really necessary.
I think that would be better than having general VSs used with combining
marks.
Peter Constable
This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 10:08:53 EDT