Re: Sorting tags

From: Kaihsu Tai (kaihsu@ugcs.caltech.edu)
Date: Fri Jun 20 1997 - 18:11:08 EDT


> One thing that I think would be a nice addition to the work on sorting
> Unicode is to specify somewhere in the Unicode code space a number of
> sorting control characters, that direct some preprocessing on strings
> before they actually enter the sorting algorithm.

Argh, is this supposed to be in a "character set encoding"?

> Simple example, in a list of names, the most significant letter is the
> first letter of the surname and not always the first letter. It is in
> many libraries common practice for the librarian who processes a newly
> purchased book to mark with a pencil the first letter of the surname
> such that this book is later always sorted in the same way.
>
> Let (1) be this marker, then we could store a list of names in a database=
>
> like
>
> John (1)Smith
> Mr. Joseph E. (1)Miller-Rubin, M.D.
> etc.

As Alain said once, different parts of names should be stored in different
_fields_, instead of putting these burdens onto the character set
encoding. They should be stored as something like

{
surname=Smith
givenname=John
}

{
surname=Miller-Rubin
middleinitial=E.
givenname=Joseph
academicdegree=M.D.
}

and the database program can then sort, display and manipulate the data
appropriately as the user requests.

> - suppression of substrings for sorting
> - replacing substrings for sorting (like in "Markus <G.|Guenther> Kuhn"

This should be done by the database. An entry like

{
givenname=Markus
middlename=Guenther
surname=Kuhn
}

will then be displayed by request as "Markus G Kuhn" (en-GB),
"Markus G. Kuhn" (en-US), "MGK", "Kuhn, M.G.", "Kuhn, Markus Guenther",
"Markus Guenther Kuhn", "Markus Kuhn", "Kuhn, M.", etc.

-- 
Khaisu Te (Kaihsu Tai)
http://www.ugcs.caltech.edu/~kaihsu/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT