Re: Cost of transition to UTF-8 for central census authorities

From: Stephane Bortzmeyer (
Date: Mon Jan 12 2009 - 03:13:46 CST

  • Next message: Asmus Freytag: "Re: Emoji: emoticons vs. literacy"

    On Sun, Jan 11, 2009 at 09:37:10AM -0700,
     Doug Ewell <> wrote
     a message of 27 lines which said:

    >> 1. The field length in the database will be longer then the display
    >> field. So, given a surname "Årø", we will have a display length of 3
    >> (letters), as compared to the database length of 5 bytes.
    > I would definitely want someone to explain to me why (1) matters.

    May be because there is a rendering program somewhere which computes
    the width of the name from the size of the field in the database? Hard
    to believe because, for rendering, variable-width fonts are much more
    complicated than UTF-8 variable-size issues but you never know.

    I, for myself, believe the original report: so many programmers still
    think that one character == one byte that this assumption influenced a
    lot of code in the application. Even if they do not have to *rewrite*
    "millions of lines of code", they'll certainly have to *review* these
    millions of lines.

    Of course, it is not a reason to give in. But you cannot pretend to
    the Norwegian authorities that the switch to Unicode will be costless
    and painless. You can just say that it is worth it.

    This archive was generated by hypermail 2.1.5 : Mon Jan 12 2009 - 03:16:22 CST