Encoding of personal names in official databases

From: Trond Trosterud (Trond.Trosterud@hum.uit.no)
Date: Tue Mar 30 1999 - 06:17:10 EST

Within the next month, I am going to write a memo to the Norwegian dept. of
justice to comment upon the planned revision of the Norwegian laws for
personal names. The goal of the revision is to allow other naming practices
than the Norwegian one, due to a culturally more heterogenous population.

My input will deal with the encoding of the names.

Today, the official Norwegian population registry is coded with ascii,
enriched with the norewegian letters on the ascii positions [\]{|}
(I guess the same solution is in use in Denmark, Sweden and Finland as
well, but with for ).

My suggestion will be that they abandon their 7-bit systems and move to...

and here I need your advice.

In Norway, Smi citizens use Smi names, the diacritics (ACUTE ACCENT,
CARON, HOOK, STROKE) are just stripped off in the registry. We have large
amounts of Finns and Swedes, their are replaced with . Immigrants from
other countries bring their letters (and alphabets) with them. A natural
answer to this is of course: Use the UCS. But the bases are huge: Every
single citizen is iincluded.

Do anyone on this list have experiences with similar cases? What is being
done around the world? Do other countries use 7-bit solutions as well? Are
there plans to migrate to 8 bits? 16 bits?

Since we need both the Smi names and the names of new immigrants, 8 bits
really are not enough. If we then use some UCS format, which one shall we
use (16-bit, utf-8,... , in order to save space and have databases with
fast retrieval?


Trond Trosterud t +47 7764 4763
Lingvistisk institutt, Det humanistiske fakultet h +47 7767 3639
N-9037 Universitetet i Troms, Noreg f +47 7764 4239
Trond.Trosterud@hum.uit.no http://www2.isl.uit.no/trond/index.html
Test string-please ignore:ᄘ--⡥-™--

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT