RE: Encoding of personal names in official databases

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Tue Mar 30 1999 - 08:04:32 EST


Don't forget that the names must not only be writable, they must also be
readable by the officials and others who use this data, and printable on the
equipment they have. So I suggest you restrict yourself to the Latin script
as used in Europe.

Jony

> -----Original Message-----
> From: Trond Trosterud [mailto:Trond.Trosterud@hum.uit.no]
> Sent: Tuesday, March 30, 1999 1:16 PM
> To: Unicode List
> Subject: Encoding of personal names in official databases
>
>
> Within the next month, I am going to write a memo to the
> Norwegian dept. of
> justice to comment upon the planned revision of the Norwegian laws for
> personal names. The goal of the revision is to allow other naming
> practices
> than the Norwegian one, due to a culturally more heterogenous population.
>
> My input will deal with the encoding of the names.
>
> Today, the official Norwegian population registry is coded with ascii,
> enriched with the norewegian letters ÆØÅæøå on the ascii positions [\]{|}
> (I guess the same solution is in use in Denmark, Sweden and Finland as
> well, but with äö for æø).
>
> My suggestion will be that they abandon their 7-bit systems and move to...
>
> and here I need your advice.
>
> In Norway, Sámi citizens use Sámi names, the diacritics (ACUTE ACCENT,
> CARON, HOOK, STROKE) are just stripped off in the registry. We have large
> amounts of Finns and Swedes, their äö are replaced with æø.
> Immigrants from
> other countries bring their letters (and alphabets) with them. A natural
> answer to this is of course: Use the UCS. But the bases are huge: Every
> single citizen is iincluded.
>
> Do anyone on this list have experiences with similar cases? What is being
> done around the world? Do other countries use 7-bit solutions as well? Are
> there plans to migrate to 8 bits? 16 bits?
>
> Since we need both the Sámi names and the names of new immigrants, 8 bits
> really are not enough. If we then use some UCS format, which one shall we
> use (16-bit, utf-8,... , in order to save space and have databases with
> fast retrieval?
>
> Greetings,
>
> -------------------------------------------------------------------
> Trond Trosterud t +47 7764 4763
> Lingvistisk institutt, Det humanistiske fakultet h +47 7767 3639
> N-9037 Universitetet i Tromsø, Noreg f +47 7764 4239
> Trond.Trosterud@hum.uit.no http://www2.isl.uit.no/trond/index.html
> Test string-please ignore:ᄘ¹š¼¿-Á‚‰¸Šº¾-â¡¥³†ˆ-™¢²…‡-æøåäö-ÆØÅÄÖ
> -------------------------------------------------------------------
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT