Re: Cost of transition to UTF-8 for central census authorities

From: Stephane Bortzmeyer (bortzmeyer@nic.fr)
Date: Mon Jan 12 2009 - 03:13:46 CST

Next message: Asmus Freytag: "Re: Emoji: emoticons vs. literacy"

Previous message: Michael Everson: "Re: Emoji: emoticons vs. literacy"
In reply to: Doug Ewell: "Re: Cost of transition to UTF-8 for central census authorities"
Next in thread: Tim Greenwood: "Re: Cost of transition to UTF-8 for central census authorities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Sun, Jan 11, 2009 at 09:37:10AM -0700,
Doug Ewell <doug@ewellic.org> wrote
a message of 27 lines which said:

>> 1. The field length in the database will be longer then the display
>> field. So, given a surname "Årø", we will have a display length of 3
>> (letters), as compared to the database length of 5 bytes.
...
> I would definitely want someone to explain to me why (1) matters.

May be because there is a rendering program somewhere which computes
the width of the name from the size of the field in the database? Hard
to believe because, for rendering, variable-width fonts are much more
complicated than UTF-8 variable-size issues but you never know.

I, for myself, believe the original report: so many programmers still
think that one character == one byte that this assumption influenced a
lot of code in the application. Even if they do not have to *rewrite*
"millions of lines of code", they'll certainly have to *review* these
millions of lines.

Of course, it is not a reason to give in. But you cannot pretend to
the Norwegian authorities that the switch to Unicode will be costless
and painless. You can just say that it is worth it.

Next message: Asmus Freytag: "Re: Emoji: emoticons vs. literacy"
Previous message: Michael Everson: "Re: Emoji: emoticons vs. literacy"
In reply to: Doug Ewell: "Re: Cost of transition to UTF-8 for central census authorities"
Next in thread: Tim Greenwood: "Re: Cost of transition to UTF-8 for central census authorities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 12 2009 - 03:16:22 CST