A few questions about decomposition, equvalence and rendering

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Tue Feb 05 2002 - 07:17:34 EST


Dear all,

Sorry if these questions have been answered before.

Spacing diacritical marks (e.g. U+00A8) have compatibility
decompositions of the form 0020 xxxx. Why are these not canonical
decompositions? Under what circumstances would you expect the spacing
marks to behave differently from their decompositions?

The two that are in ASCII don't decompose. Is that because they're
overloaded?

A number of combining characters (e.g. U+0340, U+0341, U+0343) have
canonical equivalents, i.e. canonical decompositions that are a single
character. In other words, we have pairs of codepoints that are bound
to behave in exactly the same manner under all circumstances. What's
the deal?

Unicode contains a number of precomposed spacing diacritical marks for
Greek (e.g. U+1FC1). However, and unless I've missed something, with
the exception of U+0385, they do not have combining (non-spacing)
versions. What's the rationale here?

(Similar precomposed diacritical marks do not seem to exist for
Vietnamese, which makes me think they've been included for
compatibility with legacy encodings rather than for a good reason.
Still, because their decompositions are not canonical, they need to be
taken into account, which in my case complicates what would otherwise
be somewhat cleaner code.)

When rendering stacked combining characters (i.e. sequences of
combining characters with the same non-zero combining class), which
sequences need to be treated specially (as opposed to being stacked on
top of each other)? I already know about the pairs needed for Greek
(both Mono- and Polytonic) and Vietnamese.

As far as I can tell, there is nothing in the Unicode database that
relates a ``modifier letter'' to the associated punctuation mark. Is
that right? Does anyone have such data that I could steal?
(Hopefully with no legal strings attached.)

(Aside: I would expect a search function in a text editor or a search
engine to identify modifier letters with punctuation marks -- I expect
the two to be confused in practice. But I couldn't find anything to
this effect in the Book.)

On a related note, does anyone has a map from mathematical characters
to the Geometric Shapes, Misc. symbols and Dingbats that would be
useful for rendering?

Thanks a lot,

                                        Juliusz



This archive was generated by hypermail 2.1.2 : Tue Feb 05 2002 - 07:02:06 EST