> There is another side to this, Maurice. A lot of languages use a lot of
> accented characters. The Irish word "iirmgm" has six letters in it. In
(sorry if my legacy UNIX-based mailer has ruined the accented letters...)
> Latin 1 it has six characters in it. In decomposed Unicode encoding it has
> nine characters (e4iri4gi4). So one thing that canonical decomposition does
> is increase the size of our files. Considerably. In Europe, we have
> considered this objectionable.
> Naturally it doesn't much affect file size in English. But it does in
> Irish. And Czech, Polish, Icelandic....
I have often had the feeling that Unicode has a very strong bias towards
the world of GUI word processors and the like, whose business is rendering,
and which are accostomed to composing presentation forms from sequences of
characters, and which do not have realtime, storage, or communications
constraints. In a way, it's sort of fond tip of the hat to the "legacy" world
of terminals (or emulators of terminals), plain text, and so on.
However, that world is still very much with us and will be for many years to
come. Look: we're using it right now! My primary interest in Unicode is to
find ways to make use of it in that world; for example, as the common
intermediate representation for all character sets "on the wire" or in other
forms of interchange where "legacy" character sets and presentation forms are
still used at the "leaf nodes" of our network -- PCs, terminals, printers.
One important characteristic of the old plain-text world is that it is not
accustomed to composing characters on the fly, e.g. in communications
protocols, terminal emulators, and so on (the one big exception being the ALA
character set and terminals used by IBM mainframe-based bibliographic systems,
but that is a closed world).
For this reason, I don't think that precomposed characters are a bad thing,
to be deprecated -- I'm glad they are there, to the extent they are, and would
be even happier if there were more of them :-)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT