Dan Oscarsson said:
>> If you look att letter: 0xD8 it cannot be decomposed,
>> but letter: 0xD6 can be decomposed.
>> This is inconsistant because the glyph 0xD8 can be decomposed
>> into letter o with a combining slash.
John Cowan replied:
>Combining-slash decompositions are considered to be over the top:
>they're impossible to recombine at the glyph level accurately,
>because the position of the slash varies randomly depending on the
This is not very convincing. The same thing is true also with many other
combining characters. Even a diaresis, or an acute accent do look different
on different base characters (e.g. small and capital letters).
As in other cases, the rendering/font designers may apply valid strategies,
- Using several different <combining slash> glyphs for different contexts;
- Using pre-composite glyphs for well known sequences like <O with slash>;
- Normalizing text to use composite characters where possible;
- Accepting ugly result in very unusual cases.
Dan's mail is inaccurate in many details, but the statement that U+00D8
should have a canonical decomposition seems right.
> -----Original Message-----
> From: John Cowan [SMTP:firstname.lastname@example.org]
> Sent: 1999 October 04, Monday 17.25
> To: Unicode List
> Subject: Re: Why is Unicode inconsistant?
> Dan Oscarsson scripsit:
> > Looking at the Unicode character data file I see that Unicode is
> > inconsistant.
> Obviously this needs to go in the FAQ.
> > If you look att letter: 0xD8 it cannot be decomposed,
> > but letter: 0xD6 can be decomposed.
> > This is inconsistant because the glyph 0xD8 can be decomposed
> > into letter o with a combining slash.
> Combining-slash decompositions are considered to be over the top:
> they're impossible to recombine at the glyph level accurately,
> because the position of the slash varies randomly depending on the
> base letter.
> "The line must be drawn here!" -- J.-L. Picard
> > The same inconsistancy exist for 0xC6 and 0xC4.
> > The glyph of letter 0xC4 can be decomposed into letter a with a
> combining e.
> U+00C4 is another boundary case: letter or ligature? But it is certainly
> not equivalent to "ae" except in Latin (the language, not the script).
> > It gets more inconsistant when you think about that the letter 0xC6 and
> > are the same letter, but one is a Norwegian/Danish version and the other
> > Swedish.
> In that context, yes. But they are not really equivalent in German, and
> less so in Finnish.
> > Why does Unicode favor one language and an other not?
> It does not.
> > It can get worse when a font is created: a letter a with a diaeresis
> > may be a different glyph than the letter 0xC4 (which have no English
> High-quality fonts are always language-specific: we have already learned
> that proper Polish fonts use differently placed accents from their
> Western European analogues. Unicode is concerned with *plain* text,
> in other words, whatever cannot be abandoned without abandoning
> > I have seen several bad fonts where somebody thinks that the letter
> > 0xC4 is a letter a with a diaeresis and just combined the two instead
> > of having a true letter 0xC4.
> Inevitably so.
> > Unicode need to understand the difference between precomposed characters
> > and those that are not (0xC4 is not a precomposed character, it is
> > a single letter just like 0xC6).
> No, it is is font designers who need to know when precomposed glyphs
> work and when they don't.
> John Cowan email@example.com
> I am a member of a civilization. --David Brin
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT