Re: Welsh Collation

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Apr 22 2006 - 19:25:23 CST

  • Next message: Rick McGowan: "Running out of code points (was: Re: unicode Digest V6 #85)"

    Michael Everson wrote on Saturday, April 22, 2006 at 5:55 PM
    Subject: Re: Welsh Collation

    > The Welsh do not use CGJ at all, and they should not. I have been asked by
    > the Welsh for recommendations from time to time, and I have never told
    > them to use CGJ. The collation sequence should be tailored instead.
    > Burdening Welsh text with CGJ would be very bad indeed.

    Are we talking about the same thing? Tailoring will obviously handle the
    cases where 'ng' and 'll' act as single 'letters', between 'g' and 'h' and
    between 'l' and 'm' respectively. I believe CGJ is needed for the awkward
    cases. The usual example is <dangos> show. Going through the Collins
    Spurrel Pocket Welsh dictionary, I found, in order:

    danfon 'send, convey'
    dangos 'show' (and some of its compounds)
    danheddog 'jagged, serrated, toothed'

    Are you suggesting that these awkward cases can be handled by making the
    tailoring a compressed exception dictionary? The Welsh Language Board (
    http://www.bwrdd-yr-iaith.org.uk/download.php/pID=66182.1 ) suggests in
    Section 4.2 two levels of collation ability:

    1) Use the Welsh rules, ignoring exceptions (minimum requirement)
    2) Use the Welsh rules plus an exception dictionary (best)

    It also says that CGJs in the data should be used where available, giving
    the example of a CGJ in 'Williams', a very common surname in Wales.
    (Section 4.2.4.) English proper names may be a very fertile source of
    exceptions - and note that many Welsh surnames follow English spelling
    rules.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sat Apr 22 2006 - 19:26:32 CST