Why is Unicode inconsistant?

From: Dan Oscarsson (Dan.Oscarsson@trab.se)
Date: Mon Oct 04 1999 - 02:57:53 EDT


Hi

Looking at the Unicode character data file I see that Unicode is
inconsistant.

If you look att letter: 0xD8 it cannot be decomposed,
but letter: 0xD6 can be decomposed.

This is inconsistant because the glyph 0xD8 can be decomposed
into letter o with a combining slash.

The same inconsistancy exist for 0xC6 and 0xC4.
The glyph of letter 0xC4 can be decomposed into letter a with a combining e.

It gets more inconsistant when you think about that the letter 0xC6 and 0xC4
are the same letter, but one is a Norwegian/Danish version and the other
Swedish. Why can one be decomposed and one not?
The same goes for 0xD8 and 0xD6.
Why does Unicode favor one language and an other not?

Is just that somebody thought that the glyph for 0xC4 could be chopped
ito pieces but not 0xC6?

It can get worse when a font is created: a letter a with a diaeresis
may be a different glyph than the letter 0xC4 (which have no English name).
I have seen several bad fonts where somebody thinks that the letter
0xC4 is a letter a with a diaeresis and just combined the two instead
of having a true letter 0xC4.

Unicode need to understand the difference between precomposed characters
and those that are not (0xC4 is not a precomposed character, it is
a single letter just like 0xC6).

   Dan



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT