From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Wed Jul 02 2003 - 06:20:54 EDT
> Believe it or not, the IJ and ij digraphs *were* included for
> compatibility with an 8-bit legacy character set (ISO 6937).
6937 is a multibyte encoding (one or two bytes per character).
There are no combining characters at all in 6937, even though
there is a common misunderstanding that there are, since the
lead bytes are (almost) systematically assigned.
> Whether
> that automatically means they should have been assigned canonical
> instead of compatibility decompositions, I don't know.
I think in this case it is correct that the decomposition is a compatibility
one. It could have been: none; like for the oe and ae ligatures.
This is in contrast to the MICRO SIGN which ideally should have had
a canonical decomposition; but Latin-1 characters got special treatment
(and ASCII characters have even more special treatment in this regard,
where some spacing accents are not decomposed at all).
/kent k
This archive was generated by hypermail 2.1.5 : Wed Jul 02 2003 - 07:08:21 EDT