Re: Cost of transition to UTF-8 for central census authorities

From: philip chastney (philip_chastney@yahoo.com)
Date: Wed Jan 14 2009 - 14:51:03 CST

Next message: Kenneth Whistler: "Compatibility Character (was: Re: Emoji: emoticons vs. literacy)"

Previous message: Mark Davis: "Re: Case mapping tests"
Maybe in reply to: Trond Trosterud: "Cost of transition to UTF-8 for central census authorities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

--- On Sun, 11/1/09, Trond Trosterud <trond.trosterud@hum.uit.no> wrote:
From: Trond Trosterud <trond.trosterud@hum.uit.no>
Subject: Cost of transition to UTF-8 for central census authorities
To: "Unicode List" <unicode@unicode.org>
Date: Sunday, 11 January, 2009, 3:02 PM

I have the following question to the list:

In Norway, our large census databases (https://infobank.edb.com, contains the
names, social sec num, address, cars, companies, boats, etc, etc, of all
Norwegian citizens). Today, it is encoded with the 8859-1 charset, probably in
8859-1 (some old registries may be EBCDIC, but with the same character
repertoire or a subset).

Now, Norway wants to
be able to use Sámi in that register, i.e., 6x2 letters
from the Latin A block in Unicode. ISO/IEC 8859-4 and -10 are possible, but a
natural solution is UTF-8.
icebergs spring to mind here

Sámi may be the trigger, but it is part of a bigger issue

how are names of East European immigrants handled, for instance?
surely they are not all unregistered?

and what happens to East Europeans who want to adopt Norwegian citizenship?
are they required to renounce their diacritical markings?

issues like these are going to have to be faced -- wouldn't it be more cost-effective to adopt a solution to the stated problem (Sámi) which also solves the problem with other languages?

.... and other scripts (let us not forget that Norway shares a border with Russia)

to put it another way: a move to Unicode will have to be made sometime, and the longer it takes to commit to that move, the more it will cost to convert

once that point is generally accepted as policy, where and when these Unicode string are stored/transmitted/processed as UTF-8 or UTF-32 is a completely separate technical issue

it is not common for IT projects to under-run their estimated costs, but if the costs and time-scales for conversion seem inflated, the estimators may be ill-informed, cautious, or hoping for the contract ... or just plain wrong

/phil

Next message: Kenneth Whistler: "Compatibility Character (was: Re: Emoji: emoticons vs. literacy)"
Previous message: Mark Davis: "Re: Case mapping tests"
Maybe in reply to: Trond Trosterud: "Cost of transition to UTF-8 for central census authorities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 14 2009 - 14:53:48 CST