character "combinability"

From: spir (denis.spir@free.fr)
Date: Thu Feb 18 2010 - 08:15:07 CST

Next message: Apostolos Syropoulos: "Re: Greek chars encoded twice -- why?"

Previous message: Werner LEMBERG: "Re: Greek chars encoded twice -- why?"
Next in thread: Samuel Thibault: "Re: character "combinability""
Reply: Samuel Thibault: "Re: character "combinability""
Reply: Kent Karlsson: "Re: character "combinability""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,

Does Unicode specify which characters, especially bases (*), are allowed for combination (into a combining sequence)? For instance, from the ASCII subset, it seems to me only letters can occur in a combination --except for the special case of CR-LF. But I could not find any such restriction list. There may be two cases, imo:

-1- Either Unicode does not impose any restriction on combination. But then we can and are allowed to concretely encode characters (or rather grapheme) that have no attested existence in real use: for instance, (ASTERISK, COMBINING CIRCUMFLEX). This seems to me contradictory with unicode guidelines, I guess. But opens the door to creative use of unicode ;-)

-2- Or there are such restrictions. These data should not only specify wich characters can combine absolutely, but also with which class of combining marks there are allowed to do it. Signs like '*' cannot combine at all, probably. ASCII letters can only combine with a given class of diacritics. Actually, this is the case for Hangul syllabs.

Or is there a kind of implicit gentleman's agreement; meaning combinations should be used in a sensible manner?

(Where can I find accurate information on this topic?)

Denis

(*) For instance, the algorithm for grouping characters into "grapheme clusters" specifies that "extend codes" be allways grouped with the previous code. This seems to allow any "Combining Mark" arbitrarily be placed on any character (even a non-base one, actually).
________________________________

la vita e estrany

http://spir.wikidot.com/

Next message: Apostolos Syropoulos: "Re: Greek chars encoded twice -- why?"
Previous message: Werner LEMBERG: "Re: Greek chars encoded twice -- why?"
Next in thread: Samuel Thibault: "Re: character "combinability""
Reply: Samuel Thibault: "Re: character "combinability""
Reply: Kent Karlsson: "Re: character "combinability""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Feb 18 2010 - 08:18:14 CST