From: Eric Muller (emuller@adobe.com)
Date: Mon Nov 05 2007 - 10:34:40 CST
Kent Karlsson wrote:
> And yet the UTC, as well as WG2, seems to be in the process of adopting 100+
> Hangul Jamo that are aren't even ligature-like, but each just represents a sequence
> of Hangul conjoining letters.
>
IMHO, this is the predictable and inevitable result of the canonical
decompositions which have been frozen in Unicode 3.1.
Since that day, the standard says that they are three different coded
character sequences to represent 갂:
S = <AC02>
LVT=<1100 1161 11A9>
LVTT=<1100 1161 11A8 11A8>
with S and LVT canonically equivalent but not equivalent to LVTT. The
bulk of the data that exists today uses S/LVT; where "bulk" is probably
99%. The idea of LVTT, however sensible and desirable, did not happen in
practice. Because 11A9 is not and cannot be made canonically equivalent
11A8, 11A8, I believe that the only sensible course of action is to
admit that the idea of L+, V+, T+ in a syllable did not succeed, and
continue down the path of "complex" jamos (such as 11A9). I would even
recommend to deprecate the use of multiple "simplex" jamos in each part
of a syllable, as a way to resolve the problem of multiple
non-equivalent representations, and the implementation problems that
causes (In fact, I am ready to bet that most implementations simply
treat LVTT as different from S/LVT, one more reason for cleaning the
standard.)
I think the alternative you prefer (keep using <11A9>, but do not create
new combinations like that) would not result in a system that is clean
from a model point of view, nor in a system that is clean from an
implementation point of view. So I don't see anything that makes it
desirable.
Eric.
This archive was generated by hypermail 2.1.5 : Mon Nov 05 2007 - 10:36:55 CST