Re: Cost of transition to UTF-8 for central census authorities

From: Doug Ewell (
Date: Sun Jan 11 2009 - 10:37:10 CST

  • Next message: John Hudson: "Re: Flag Symbols"

    Trond Trosterud <trond dot trosterud at hum dot uit dot no> wrote:

    > Governmental experts see 3 drawbacks with UTF-8:
    > 1. The field length in the database will be longer then the display
    > field. So, given a surname "Årø", we will have a display length of 3
    > (letters), as compared to the database length of 5 bytes.
    > 2. There will have to be a new sorting routine, and a new search
    > routine
    > 3. Programs may no longer search for characters as single bytes, but
    > must in some cases open for search of sequence of bytes.
    > 4. Many common programs only support 8-bit character sets
    > 5. Data must be removed from registries, converted and replaced
    > 6. Millions of lines of code must be changed and tested

    Switching from an SBCS architecture to UTF-8 does involve some work, but
    these statements contain so much FUD that it's difficult to know where
    to begin.

    I would definitely want someone to explain to me why (1) matters.

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Sun Jan 11 2009 - 10:38:12 CST