Within the next month, I am going to write a memo to the Norwegian dept. of
justice to comment upon the planned revision of the Norwegian laws for
personal names. The goal of the revision is to allow other naming practices
than the Norwegian one, due to a culturally more heterogenous population.
My input will deal with the encoding of the names.
Today, the official Norwegian population registry is coded with ascii,
enriched with the norewegian letters ������ on the ascii positions [\]{|}
(I guess the same solution is in use in Denmark, Sweden and Finland as
well, but with �� for ��).
My suggestion will be that they abandon their 7-bit systems and move to...
and here I need your advice.
In Norway, S�mi citizens use S�mi names, the diacritics (ACUTE ACCENT,
CARON, HOOK, STROKE) are just stripped off in the registry. We have large
amounts of Finns and Swedes, their �� are replaced with ��. Immigrants from
other countries bring their letters (and alphabets) with them. A natural
answer to this is of course: Use the UCS. But the bases are huge: Every
single citizen is iincluded.
Do anyone on this list have experiences with similar cases? What is being
done around the world? Do other countries use 7-bit solutions as well? Are
there plans to migrate to 8 bits? 16 bits?
Since we need both the S�mi names and the names of new immigrants, 8 bits
really are not enough. If we then use some UCS format, which one shall we
use (16-bit, utf-8,... , in order to save space and have databases with
fast retrieval?
Greetings,
-------------------------------------------------------------------
Trond Trosterud t +47 7764 4763
Lingvistisk institutt, Det humanistiske fakultet h +47 7767 3639
N-9037 Universitetet i Troms�, Noreg f +47 7764 4239
[email protected] http://www2.isl.uit.no/trond/index.html
Test string-please ignore:ᄘ����-�������-⡥���-����-�����-�����
-------------------------------------------------------------------
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT