From: Kent Karlsson (firstname.lastname@example.org)
Date: Thu Feb 18 2010 - 08:34:24 CST
Den 2010-02-18 15.15, skrev "spir" <email@example.com>:
> Does Unicode specify which characters, especially bases (*), are allowed for
> combination (into a combining sequence)? For instance, from the ASCII subset,
> it seems to me only letters can occur in a combination --except for the
> special case of CR-LF. But I could not find any such restriction list. There
> may be two cases, imo:
CR-LF is not a combining sequence.
But talking about combining sequences:
> -1- Either Unicode does not impose any restriction on combination. But then we
> can and are allowed to concretely encode characters (or rather grapheme) that
> have no attested existence in real use: for instance, (ASTERISK, COMBINING
> CIRCUMFLEX). This seems to me contradictory with unicode guidelines, I guess.
> But opens the door to creative use of unicode ;-)
You are free to combine away. Not all will render properly, but that is a
property of the font+rendering engine.
> -2- Or there are such restrictions. These data should not only specify wich
> characters can combine absolutely, but also with which class of combining
> marks there are allowed to do it. Signs like '*' cannot combine at all,
> probably. ASCII letters can only combine with a given class of diacritics.
> Actually, this is the case for Hangul syllabs.
Hangul is handled specially (it did not have to be, but it is).
> Or is there a kind of implicit gentleman's agreement; meaning combinations
> should be used in a sensible manner?
You could say that.
> (Where can I find accurate information on this topic?)
> (*) For instance, the algorithm for grouping characters into "grapheme
> clusters" specifies that "extend codes" be allways grouped with the previous
> code. This seems to allow any "Combining Mark" arbitrarily be placed on any
> character (even a non-base one, actually).
> la vita e estrany
This archive was generated by hypermail 2.1.5 : Thu Feb 18 2010 - 08:35:36 CST