From: Kenneth Whistler (email@example.com)
Date: Wed May 12 2004 - 15:59:14 CDT
Mike Ayers asked:
> I agree with those who think that interleaving Phoenician ad Hebrew
> would not be a good default. I've asked it before and I'll ask it again: is
> it not correct that language scholars are those most likely to be able to
> create and use a nondefault sort order?
They would be the most likely to be able to *specify* what particular
kind of behavior they were looking for.
They might, however, as John suggests, be completely unable to figure out
how to implement their desired specification computationally
and not have enough money to be able to pay someone competent
to do so.
However, I think the middle ground we should aim for here is to
facilitate the development of applications for sorting based
on tools like ICU which have *reasonably* easy mechanisms for
specifying arbitrarily complex custom sorting behavior.
That would give scholars a chance to make use of such tools without
having to give up their field for extended periods of time to
become sufficiently talented programming geeks to implement
the behavior they are after.
After all, the implementation of script folding in the ICU
framework is not exactly rocket science. It is effectively
the moral equivalent of 22 lines of the following sort:
05CD=E000 ; Hebrew aleph primary equal to Phoenician alp
etc., read by the ICU API's to specify a tailoring.
People agitating to have Hebrew and Phoenician conflated together
in the *default* Unicode collation element table may be
overlooking the fact that it is very unlikely that the default
table is going to produce optimal sorting, as required for
Semiticists, of Hebrew data, anyway -- it is just a default,
"reasonable" ordering for Hebrew. Scholars will
probably need to tweak it, particularly for points, accents,
punctuation-like marks, maybe some symbols, and so on. In
that context, deciding whether or not to conflate Hebrew square
script and Phoenician (~ Old Canaanite, or whatever we end up
calling it) in primary order just becomes part of the overall
task of coming up with *optimal* collations for particular
Also, keep the following in mind when considering exceptions to
the default conventions for the default primary ordering of
scripts in the DUCET:
It is rather easier to tailor a *conflation* of two scripts
that are separated in the default table, than it is to tailor
a *separation* of two scripts which are conflated in the
default table. Either is doable, of course, but the latter
requires *more* knowledge of how to do tailoring and how to
avoid pitfalls in tailoring than the former does.
This archive was generated by hypermail 2.1.5 : Wed May 12 2004 - 16:00:18 CDT