Re: Sorting tags

From: Markus G. Kuhn (kuhn@cs.purdue.edu)
Date: Fri Jun 20 1997 - 18:42:29 EDT


Kaihsu Tai wrote on 1997-06-20 22:11 UTC:
> As Alain said once, different parts of names should be stored in different
> _fields_, instead of putting these burdens onto the character set
> encoding. They should be stored as something like
>
> {
> surname=Smith
> givenname=John
> }
>
> {
> surname=Miller-Rubin
> middleinitial=E.
> givenname=Joseph
> academicdegree=M.D.
> }

I thought it was one of the big lessons learned from the X.400 O/R
names, that this does not scale well, as the concept of having
a surname, a middle initial, and a givenname is something that works only
in the U.S. and in around 10 other countries. Only a flat linear
sequence of characters (as it was introduced in the X.500 CN attribute)
is really capable of representing human names adequatly for all
cultures. Realizing that, adding some sorting control characters
to this single common flat name string field suddenly makes a lot
of sense to me.

I still have to see a data structure like you suggest that really
covers *all* the various human naming schemes that are used on this
planet adequately. A structured person name will probably be something
like a five page long dense SGML DTD which no sane database designer
is going to implement in separate fields.

The Indian guy sitting right here next to me has the name

  Karthic Nataraj Nadarajapillai Sivathanupillai

(well, it is actually written in Tamil characters)

He does not even know, which part of his name could be best described
as "surname" or "given name", and he might feel really pissed off if
some database software tries to reorder his name components in some
way, as this might then become his father's or brother's name. Now map
this onto your simple-minded above structure ... ;-) Greetings to
"Bull, S." (Sitting Bull).

Another good example of a naming structure that is so diverse that only
an unstructured single field (according to ISO 11080 preferably 30x6
chars large) with added markup can handle are postal addresses from
all over the world.

> Argh, is this supposed to be in a "character set encoding"?

I am not suggesting to make this part of the Unicode 3.0 standard itself,
but reserving a few code positions for this purpose and defining some
sorting control characters in another ISO standard won't hurt too
much, right? We could even assign suggested glyphs to these sorting
control characters that will only be displayed when you edit a
name but that are normally made invisible by software that just
displays sorted strings. Glyphs just like this small hook that librarians
make with a pencil in front of the first letter of the first author's
surname on the title page.

Markus

-- 
Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email: kuhn@cs.purdue.edu



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT