From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Dec 11 2004 - 18:45:41 CST
From: "Peter R. Mueller-Roemer" <pmr@informatik.uni-frankfurt.de>
> For a fixed length of combining character sequence (base + 3 combining
> marks is the most I have seen graphically distinguishable) the repertore
> is still finite.
I do think that you are underestimating the repertoire. Also Unicode does
NOT define an upper bound for the length of combining sequences, and also
not on the length of default grapheme clusters (which can be composed of
multiple combining sequences, for example in the Hangul or Tibetan scripts)
Your estimations also ignores various layouts found in Asian texts, and the
particular structures of historic texts which can use many "diacritics" on
top of a single base letter starting a combining sequence. The model of
these scripts (for example Hebrew) imply the justaposition of up to 13 or 15
levels of diacritics for the same base letter!
In practice, it's impossible to enumerate all existing combinations (and
ensure that they will be assigned a unique code within a reasonnably limited
code point), and that's why a simpler model based on more basic but
combinable code points is used in Unicode: it frees Unicode from having to
encode all of them (this is already a difficult task for the Han script
which could have been encoded with combining sequences, if the algorithms
needed to create the necesssary layout had not needed the use of so many
complex rules and so many exceptions...)
This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 18:46:27 CST