From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Sat Jul 26 2003 - 06:31:59 EDT
On 25/07/2003 17:39, Kenneth Whistler wrote:
>
>
>...In Unicode 4.0, CGJ has been stripped of all interpretation
>except as an invisible mark which can be used to tailor
>collation (and searching), so as to distinguish digraphic units
>from sequences of the same characters.
>
Thank you, Ken, for the long and helpful explanation of which this is an 
extract.
One question arises. If CGJ is used as proposed, so we have sequences 
such as patah CGJ hiriq and perhaps meteg CGJ vowel, does this imply 
that these sequences will necessarily be treated in collation as 
distinct from simple patah hiriq and meteg vowel sequences (the latter 
would of course be reversed by normalisation)? This is a simple 
question. I'm not yet sure if this would be desirable or not. Well, it 
would probably be better for meteg CGJ vowel to be collated the same as 
vowel meteg, as the distinction here is graphical but not semantic. As 
for patah CGJ hiriq, an advantage of collating this sequence the same as 
hiriq patah would be that existing texts which do not have CGJ here 
would be collated together with ones which do, and perhaps that users 
doing searches would not have to type the CGJ. But is this perhaps 
something for which specific collation rules can be tailored?
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Sat Jul 26 2003 - 07:11:52 EDT