> One thing that I think would be a nice addition to the work on sorting
> Unicode is to specify somewhere in the Unicode code space a number of
> sorting control characters, that direct some preprocessing on strings
> before they actually enter the sorting algorithm.
Argh, is this supposed to be in a "character set encoding"?
> Simple example, in a list of names, the most significant letter is the
> first letter of the surname and not always the first letter. It is in
> many libraries common practice for the librarian who processes a newly
> purchased book to mark with a pencil the first letter of the surname
> such that this book is later always sorted in the same way.
> Let (1) be this marker, then we could store a list of names in a database=
> John (1)Smith
> Mr. Joseph E. (1)Miller-Rubin, M.D.
As Alain said once, different parts of names should be stored in different
_fields_, instead of putting these burdens onto the character set
encoding. They should be stored as something like
and the database program can then sort, display and manipulate the data
appropriately as the user requests.
> - suppression of substrings for sorting
> - replacing substrings for sorting (like in "Markus <G.|Guenther> Kuhn"
This should be done by the database. An entry like
will then be displayed by request as "Markus G Kuhn" (en-GB),
"Markus G. Kuhn" (en-US), "MGK", "Kuhn, M.G.", "Kuhn, Markus Guenther",
"Markus Guenther Kuhn", "Markus Kuhn", "Kuhn, M.", etc.
-- Khaisu Te (Kaihsu Tai) http://www.ugcs.caltech.edu/~kaihsu/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT