Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Sat Jul 26 2003 - 06:31:59 EDT

  • Next message: Andrew C. West: "Re: Damn'd fools"

    On 25/07/2003 17:39, Kenneth Whistler wrote:

    >
    >
    >...In Unicode 4.0, CGJ has been stripped of all interpretation
    >except as an invisible mark which can be used to tailor
    >collation (and searching), so as to distinguish digraphic units
    >from sequences of the same characters.
    >
    Thank you, Ken, for the long and helpful explanation of which this is an
    extract.

    One question arises. If CGJ is used as proposed, so we have sequences
    such as patah CGJ hiriq and perhaps meteg CGJ vowel, does this imply
    that these sequences will necessarily be treated in collation as
    distinct from simple patah hiriq and meteg vowel sequences (the latter
    would of course be reversed by normalisation)? This is a simple
    question. I'm not yet sure if this would be desirable or not. Well, it
    would probably be better for meteg CGJ vowel to be collated the same as
    vowel meteg, as the distinction here is graphical but not semantic. As
    for patah CGJ hiriq, an advantage of collating this sequence the same as
    hiriq patah would be that existing texts which do not have CGJ here
    would be collated together with ones which do, and perhaps that users
    doing searches would not have to type the CGJ. But is this perhaps
    something for which specific collation rules can be tailored?

    -- 
    Peter Kirk
    peter.r.kirk@ntlworld.com
    http://web.onetel.net.uk/~peterkirk/
    


    This archive was generated by hypermail 2.1.5 : Sat Jul 26 2003 - 07:11:52 EDT