Re: A basic question on encoding Latin characters

From: Rick McGowan (
Date: Thu Sep 23 1999 - 14:26:53 EDT

Quoting myself again, re the NeXT/Apple implementations...

> It is one industrial implementation that would be perfectly happy
> to have not a single precomposed Latin accented combination encoded
> at all.

Let me elaborate on that. The largest and messiest parts of doing a Unicode
display system from scratch are in dealing with all of the "garbage"
codepoints and normalization and/or decomposition of the precomposed Latin
characters, and then hassling with conversion to/from legacy encodings.
Scratch implementation of a text system would be greatly aided by not having
to deal with the precomposed stuff.

There are *some* benefits of precomposition, which are mainly in sorting --
you can use dumb algorithms for handling a lot of languages. That benefit,
in any generalized system, is vastly overshadowed by the complexity of doing
any of the more difficult sorting systems; so if you're implementing a full
system of sorting, there is no need for the precomposed combinations, and
they only get in your way, because you have to normalize even more multiple
spellings than you would if there were none of these precomposed things.

And well-written extensible keyboard software ought to be able to mask the
underlying encoding by allowing one to map any arbitrary string of stuff onto
a single keystroke. So for even the naivest of end-users, there is no need
for pre-composition.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT