RE: interleaved ordering (was RE: Phoenician)

From: Kenneth Whistler (
Date: Wed May 12 2004 - 15:59:14 CDT

  • Next message: Kenneth Whistler: "Re: Writing Tatar using the Latin script; new characters to encode?"

    Mike Ayers asked:

    > I agree with those who think that interleaving Phoenician ad Hebrew
    > would not be a good default. I've asked it before and I'll ask it again: is
    > it not correct that language scholars are those most likely to be able to
    > create and use a nondefault sort order?

    They would be the most likely to be able to *specify* what particular
    kind of behavior they were looking for.

    They might, however, as John suggests, be completely unable to figure out
    how to implement their desired specification computationally
    and not have enough money to be able to pay someone competent
    to do so.

    However, I think the middle ground we should aim for here is to
    facilitate the development of applications for sorting based
    on tools like ICU which have *reasonably* easy mechanisms for
    specifying arbitrarily complex custom sorting behavior.
    That would give scholars a chance to make use of such tools without
    having to give up their field for extended periods of time to
    become sufficiently talented programming geeks to implement
    the behavior they are after.

    After all, the implementation of script folding in the ICU
    framework is not exactly rocket science. It is effectively
    the moral equivalent of 22 lines of the following sort:

    05CD=E000 ; Hebrew aleph primary equal to Phoenician alp

    etc., read by the ICU API's to specify a tailoring.

    People agitating to have Hebrew and Phoenician conflated together
    in the *default* Unicode collation element table may be
    overlooking the fact that it is very unlikely that the default
    table is going to produce optimal sorting, as required for
    Semiticists, of Hebrew data, anyway -- it is just a default,
    "reasonable" ordering for Hebrew. Scholars will
    probably need to tweak it, particularly for points, accents,
    punctuation-like marks, maybe some symbols, and so on. In
    that context, deciding whether or not to conflate Hebrew square
    script and Phoenician (~ Old Canaanite, or whatever we end up
    calling it) in primary order just becomes part of the overall
    task of coming up with *optimal* collations for particular
    scholastic requirements.

    Also, keep the following in mind when considering exceptions to
    the default conventions for the default primary ordering of
    scripts in the DUCET:

    It is rather easier to tailor a *conflation* of two scripts
    that are separated in the default table, than it is to tailor
    a *separation* of two scripts which are conflated in the
    default table. Either is doable, of course, but the latter
    requires *more* knowledge of how to do tailoring and how to
    avoid pitfalls in tailoring than the former does.


    This archive was generated by hypermail 2.1.5 : Wed May 12 2004 - 16:00:18 CDT