Re: character "combinability"

From: Kent Karlsson (
Date: Thu Feb 18 2010 - 08:34:24 CST

  • Next message: Kent Karlsson: "Re: Greek chars encoded twice -- why?"

    Den 2010-02-18 15.15, skrev "spir" <>:

    > Hello,
    > Does Unicode specify which characters, especially bases (*), are allowed for
    > combination (into a combining sequence)? For instance, from the ASCII subset,
    > it seems to me only letters can occur in a combination --except for the
    > special case of CR-LF. But I could not find any such restriction list. There
    > may be two cases, imo:

    CR-LF is not a combining sequence.

    But talking about combining sequences:

    > -1- Either Unicode does not impose any restriction on combination. But then we
    > can and are allowed to concretely encode characters (or rather grapheme) that
    > have no attested existence in real use: for instance, (ASTERISK, COMBINING
    > CIRCUMFLEX). This seems to me contradictory with unicode guidelines, I guess.
    > But opens the door to creative use of unicode ;-)

    You are free to combine away. Not all will render properly, but that is a
    property of the font+rendering engine.
    > -2- Or there are such restrictions. These data should not only specify wich
    > characters can combine absolutely, but also with which class of combining
    > marks there are allowed to do it. Signs like '*' cannot combine at all,
    > probably. ASCII letters can only combine with a given class of diacritics.

    Not really.

    > Actually, this is the case for Hangul syllabs.

    Hangul is handled specially (it did not have to be, but it is).

    > Or is there a kind of implicit gentleman's agreement; meaning combinations
    > should be used in a sensible manner?

    You could say that.

        /kent k

    > (Where can I find accurate information on this topic?)
    > Denis
    > (*) For instance, the algorithm for grouping characters into "grapheme
    > clusters" specifies that "extend codes" be allways grouped with the previous
    > code. This seems to allow any "Combining Mark" arbitrarily be placed on any
    > character (even a non-base one, actually).
    > ________________________________
    > la vita e estrany

    This archive was generated by hypermail 2.1.5 : Thu Feb 18 2010 - 08:35:36 CST