Re: character "combinability"

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Thu Feb 18 2010 - 08:34:24 CST

Next message: Kent Karlsson: "Re: Greek chars encoded twice -- why?"

Previous message: Samuel Thibault: "Re: character "combinability""
In reply to: spir: "character "combinability""
Next in thread: spir: "Re: character "combinability""
Reply: spir: "Re: character "combinability""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Den 2010-02-18 15.15, skrev "spir" <denis.spir@free.fr>:

> Hello,
>
> Does Unicode specify which characters, especially bases (*), are allowed for
> combination (into a combining sequence)? For instance, from the ASCII subset,
> it seems to me only letters can occur in a combination --except for the
> special case of CR-LF. But I could not find any such restriction list. There
> may be two cases, imo:

CR-LF is not a combining sequence.

But talking about combining sequences:

> -1- Either Unicode does not impose any restriction on combination. But then we
> can and are allowed to concretely encode characters (or rather grapheme) that
> have no attested existence in real use: for instance, (ASTERISK, COMBINING
> CIRCUMFLEX). This seems to me contradictory with unicode guidelines, I guess.
> But opens the door to creative use of unicode ;-)

You are free to combine away. Not all will render properly, but that is a
property of the font+rendering engine.

> -2- Or there are such restrictions. These data should not only specify wich
> characters can combine absolutely, but also with which class of combining
> marks there are allowed to do it. Signs like '*' cannot combine at all,
> probably. ASCII letters can only combine with a given class of diacritics.

Not really.

> Actually, this is the case for Hangul syllabs.

Hangul is handled specially (it did not have to be, but it is).

> Or is there a kind of implicit gentleman's agreement; meaning combinations
> should be used in a sensible manner?

You could say that.

/kent k

> (Where can I find accurate information on this topic?)
>
> Denis
>
> (*) For instance, the algorithm for grouping characters into "grapheme
> clusters" specifies that "extend codes" be allways grouped with the previous
> code. This seems to allow any "Combining Mark" arbitrarily be placed on any
> character (even a non-base one, actually).
> ________________________________
>
> la vita e estrany
>
> http://spir.wikidot.com/
>

Next message: Kent Karlsson: "Re: Greek chars encoded twice -- why?"
Previous message: Samuel Thibault: "Re: character "combinability""
In reply to: spir: "character "combinability""
Next in thread: spir: "Re: character "combinability""
Reply: spir: "Re: character "combinability""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Feb 18 2010 - 08:35:36 CST