Date: Wed Jan 24 2007 - 16:02:39 CST
Thank-you Philippe for a very insightful reply, I think I should frame
it and put it on my wall. Then "some day" ...
Quoting Philippe Verdy <email@example.com>:
> From: <firstname.lastname@example.org>
>> Unicode has consistently rejected using this approach of putting two
>> Chinese characters together to make a new one, and insists each new
>> CJKV character must be encoded, even though this would cut down the
>> number of codepionts required dramatically. Most Chinese characters
>> are in fact made in this way (over 80% if the one allows combinations
>> of combinations).
> I must ackowledge that this design choice, where the character model
> was tweaked horribly to match the desires of existing and past
> vendors, is somewhat flawed, and then it's difficult to understand
> the position of the UTC and ISO WG2 regarding other scripts that are
> horribly more complicate to implement and disavantaged (Hebrew,
> Indic scripts) because, on the opposite, a much stricter character
> model was chosen for them.
> Some choices like this inthe character model (Thaļ visible ordering,
> Hangul syllables...) at UTC (and at ISO WG2) are clearly
> inconsistant and were guided only to support legacy applications
> without any adaptation, but clearly against the encoding policy, but
> are now perceived as severely limitating or devastating for the
> evolution of the standard (and it is now a severe problem for rare
> scripts that are still not encoded, and that will be difficult to
> have them widely supported in implementations).
> This is something that, some day, will block the evolutions and put
> an end to the standard, so it places a complete industry to the risk
> of a future major switch to a new standard with necessarily
> incompatibilities and lots of costs for the future migration.
> Regarding Han, the current desire to keep ideographs encoded at the
> glyph square level only will not be maintainable (and consistancy
> problems have already occured, with multiple encondings of the same
> square), simply because the composition of these ideograph squares
> was not documented.
> It was said that ideographs do not compose easily into squares. This
> may be true for some wellknown blocks, but I think this is not
> really the rule. So these exceptions could have been handled like
> ligatures. If Han had been consistantly encoded, it would have
> priviledged the decomposed model based on radicals.
> In the same spirit, it would have been enough to encode Hangul just
> with base jamos (like they are learnt at school), using only a
> single syllable break character were needed to makethe distinction
> between final and leading consonnants and reasonnable default rules
> for the position of these composed syllable breaks. The whole Hangul
> script would have been encodable like a regular alphabet, something
> that was forgotten but that it really IS: Unicode and ISO have
> unnecessarily complicated hat was really a very simple script, and
> have wasted tens of thousands of positions in the BMP just for
> Hangul... instead of documenting a basic composition model which,
> for Hangul, is in fact very simple and extremely regular.
This message sent through Virus Free Email
This archive was generated by hypermail 2.1.5 : Wed Jan 24 2007 - 16:04:43 CST