Unicode support in open source DBMS' (was Re: Unicode in Quark 6)

From: Jungshik Shin (jshin@mailaps.org)
Date: Sun Jun 22 2003 - 22:33:48 EDT

  • Next message: Peter_Constable@sil.org: "Re: Revised N2586R"

    On Sun, 22 Jun 2003, John H. Jenkins wrote:
    > On Saturday, June 21, 2003, at 10:06 PM, Jungshik Shin wrote:
    > > PostgreSQL seems to be available for Mac OS X. See
    > > http://www.postgresql.org/ and
    > > http://developer.apple.com/internet/macosx/postgres.html

    > MySQL is also available for Mac OS X
    > (<http://developer.apple.com/internet/macosx/osdb.html>). I'm not sure
    > of the status of Unicode support, but it seems to be fine if you're not

      I guessed it is, but didn't mention it because the last time I checked
    MySQL's multibyte 'charset' support (including UTF-8) was clearly inferior
    to that of
    PostgreSQL(http://www.postgresql.org/docs/view.php?version=7.3&file=multibyte.html not downright missing.

    > worrying about collating or similar services. It's what's used at the

      This was the area where MySQL (as it was the last time I checked)
    manifested its lack of support for multibyte encodings. There are other
    problems for non-UTF-8 multibyte encodings, though. For EUC-xx (EUC-JP,
    EUC-KR, EUC-TW, EUC-CN - commonly refered to as GB2312-), it worked more
    or less (provided that sorting by codepoints having to use multiple '.'s
    to match a single multibyte characters are acceptable) although it had no
    notion of multibyte characters. As is well known, this doesn't, however,
    work well for even a simple search with metacharacter (say '%<mbc>%')
    because the last(usually the 2nd) byte of a character and the first byte
    of the following character can be mistaken for a multibyte characters.
    This broke down in a more conspicuous way if you use non-ISO-2022
    compliant encodings like Shift_JIS, Big5* and UHC(Windows-949).

      UTF-8 is an interesting case in that you wouldn't have
    false matches described above.

    Anyway, I've just checked MySQL web site and found that it now supports
    Unicode. However, its support is restricted to BMP (UCS-2 and UTF-8 with
    up to 3 bytes only). See <http://www.mysql.com/doc/en/Charset-Unicode.html>
    They clearly need to update their knowledge about Unicode...


    This archive was generated by hypermail 2.1.5 : Sun Jun 22 2003 - 23:09:27 EDT