> One thing that I think would be a nice addition to the work on sorting
> Unicode is to specify somewhere in the Unicode code space a number of
> sorting control characters, that direct some preprocessing on strings
> before they actually enter the sorting algorithm.
Argh, is this supposed to be in a "character set encoding"?
> Simple example, in a list of names, the most significant letter is the
> first letter of the surname and not always the first letter. It is in
> many libraries common practice for the librarian who processes a newly
> purchased book to mark with a pencil the first letter of the surname
> such that this book is later always sorted in the same way.
>
> Let (1) be this marker, then we could store a list of names in a database=
>
> like
>
> John (1)Smith
> Mr. Joseph E. (1)Miller-Rubin, M.D.
> etc.
As Alain said once, different parts of names should be stored in different
_fields_, instead of putting these burdens onto the character set
encoding. They should be stored as something like
{
surname=Smith
givenname=John
}
{
surname=Miller-Rubin
middleinitial=E.
givenname=Joseph
academicdegree=M.D.
}
and the database program can then sort, display and manipulate the data
appropriately as the user requests.
> - suppression of substrings for sorting
> - replacing substrings for sorting (like in "Markus <G.|Guenther> Kuhn"
This should be done by the database. An entry like
{
givenname=Markus
middlename=Guenther
surname=Kuhn
}
will then be displayed by request as "Markus G Kuhn" (en-GB),
"Markus G. Kuhn" (en-US), "MGK", "Kuhn, M.G.", "Kuhn, Markus Guenther",
"Markus Guenther Kuhn", "Markus Kuhn", "Kuhn, M.", etc.
-- Khaisu Te (Kaihsu Tai) http://www.ugcs.caltech.edu/~kaihsu/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT