personal names

From: Trond Trosterud (
Date: Wed Mar 31 1999 - 03:01:07 EST

Thank you to all the ones that have commented upon my question about
personal names in databases.

The consensus I see is UCS, UTF-8, 7-bit fallback with double
representation of the names (space not being a problem), (8-bit fallback),
the possibility of double forms for other alphabets and writing systems.

This is all very good, and please go on if there is more to be said.

I still must confess to you that I am not at all too optimistic when it
comes to whether I am able to convince the authorities that anything
concerning our 7-bit encoding should be changed at all (the never change a
winning team-philosophy). The "ignore the small additional marks" thinking
is deeply rooted, as I also have problems in making them demand the
Norwegian letters in passports (the current OCR-format allows ascii only,
and OCR-B does not get the support it deserves).

Thus, any report on how other countries do this would be valuable, and any
report giving utf-8 (or even 8-bit) as working systems for census
information will be valuable.

Is #any# country coding these databases with more than 128 characters
(national ascii dialects), say with 8-bit encodings like Latin 1, or Latin

Is #any# country coding these databases with more than 256 characters, that
is with some procedure like multiple 8859-x-shifting of the right side of
the code table (did we call it 2202?)?

Is #any# country using utf-8 or other ucs technologies?

On this list, we can consider utf-8, etc., but when I turn to the
department they will be #very# reluctant to turn to a system (UCS encoding,
utf-8, etc) that they never have heard about.

Trond Trosterud t +47 7764 4763
Lingvistisk institutt, Det humanistiske fakultet h +47 7767 3639
N-9037 Universitetet i Troms, Noreg f +47 7764 4239
Test string-please ignore:ᄘ--⡥-™--

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT