Re: infinite combinations, was Re: Nicest UTF

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Dec 11 2004 - 18:45:41 CST

  • Next message: D. Starner: "Re: Nicest UTF"

    From: "Peter R. Mueller-Roemer" <pmr@informatik.uni-frankfurt.de>
    > For a fixed length of combining character sequence (base + 3 combining
    > marks is the most I have seen graphically distinguishable) the repertore
    > is still finite.

    I do think that you are underestimating the repertoire. Also Unicode does
    NOT define an upper bound for the length of combining sequences, and also
    not on the length of default grapheme clusters (which can be composed of
    multiple combining sequences, for example in the Hangul or Tibetan scripts)
    Your estimations also ignores various layouts found in Asian texts, and the
    particular structures of historic texts which can use many "diacritics" on
    top of a single base letter starting a combining sequence. The model of
    these scripts (for example Hebrew) imply the justaposition of up to 13 or 15
    levels of diacritics for the same base letter!

    In practice, it's impossible to enumerate all existing combinations (and
    ensure that they will be assigned a unique code within a reasonnably limited
    code point), and that's why a simpler model based on more basic but
    combinable code points is used in Unicode: it frees Unicode from having to
    encode all of them (this is already a difficult task for the Han script
    which could have been encoded with combining sequences, if the algorithms
    needed to create the necesssary layout had not needed the use of so many
    complex rules and so many exceptions...)



    This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 18:46:27 CST