Re: infinite combinations, was Re: Nicest UTF

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Dec 11 2004 - 18:45:41 CST

Next message: D. Starner: "Re: Nicest UTF"

Previous message: Michael Everson: "Re: Please RSVP... (was: US-ASCII)"
In reply to: Peter R. Mueller-Roemer: "infinite combinations, was Re: Nicest UTF"
Next in thread: Peter Kirk: "Re: infinite combinations, was Re: Nicest UTF"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Peter R. Mueller-Roemer" <pmr@informatik.uni-frankfurt.de>
> For a fixed length of combining character sequence (base + 3 combining
> marks is the most I have seen graphically distinguishable) the repertore
> is still finite.

I do think that you are underestimating the repertoire. Also Unicode does
NOT define an upper bound for the length of combining sequences, and also
not on the length of default grapheme clusters (which can be composed of
multiple combining sequences, for example in the Hangul or Tibetan scripts)
Your estimations also ignores various layouts found in Asian texts, and the
particular structures of historic texts which can use many "diacritics" on
top of a single base letter starting a combining sequence. The model of
these scripts (for example Hebrew) imply the justaposition of up to 13 or 15
levels of diacritics for the same base letter!

In practice, it's impossible to enumerate all existing combinations (and
ensure that they will be assigned a unique code within a reasonnably limited
code point), and that's why a simpler model based on more basic but
combinable code points is used in Unicode: it frees Unicode from having to
encode all of them (this is already a difficult task for the Han script
which could have been encoded with combining sequences, if the algorithms
needed to create the necesssary layout had not needed the use of so many
complex rules and so many exceptions...)

Next message: D. Starner: "Re: Nicest UTF"
Previous message: Michael Everson: "Re: Please RSVP... (was: US-ASCII)"
In reply to: Peter R. Mueller-Roemer: "infinite combinations, was Re: Nicest UTF"
Next in thread: Peter Kirk: "Re: infinite combinations, was Re: Nicest UTF"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 18:46:27 CST