Re: Decomposed vs Composed accented characters

From: Otto Stolz (
Date: Fri Apr 07 2006 - 02:07:30 CST

    Hello Tay, William,

    you have asked:
    > Can accented characters be decomposed in other encodings, e.g. ISO
    > 8859-1, as well?

    The title of the ISO 88591 series contains the term "single-byte coded
    graphic character sets". The use of control functions for the coded
    representation of composite characters is prohibited by ISO 8859,
    and there are no combining, or non-spacing (cf. infra), characters

    An exception from this rule probably is ISO 8859-6 "Latin/Arabic
    alphabet". In my copy of 1987 (there may be a newer edition,
    I haven't checked it), the clause about prohibiting composition
    of characters is missing, and it defines 8 Arabic marks that
    normally are composing (such as Fatha, Damma, Kasra). However,
    the 1987 version of that standard is rather vague about the
    composing/rendering issue.

    ISO 6937 has been an approach to large character sets by heavy
    use of composition. Quote from ISO 6937/2-1983:
    > Each accented letter or umlaut is represented by a sequence
    > of bit combinations consisting of the coded representation
    > of the relevant non-spacong diacritical mark [...], followed
    > by the coded representation of the relevant basic Latin letter
    > [...]

    Best wishes,
       Otto Stolz

