Re: Primary Key validation error on Unicode strings

From: Jon Hanna (jon@hackcraft.net)
Date: Thu Nov 12 2009 - 04:13:32 CST

  • Next message: karl williamson: "Seeking advice on casing greek final sigma"

    Wunna Ko wrote:
    > While I am working on Unicode strings with MS Database 2005, I got
    > primary key validation error when I input Burmese characters ကု "Ku"
    > while ကို "Ko" is in the database.
    >
    > ကု "Ku" is U1000 + U102F and ကို "Ko" is U1000 + U102D + U102F
    >
    > It is really a separate string but MS Database treat as a same string. :(

    It's more accurate to say that the database is treating it as an
    equivalent string. Which are considered equivalent depends on the
    collation. The code:

    SELECT
    CASE WHEN
    NCHAR(0x1000) + NCHAR(0x102F) COLLATE Latin1_General_BIN =
            NCHAR(0x1000) + NCHAR(0x102D) + NCHAR(0x102F)
    THEN 'Equivalent' ELSE 'Distinct'
    END,
    CASE WHEN
    NCHAR(0x1000) + NCHAR(0x102F) COLLATE Latin1_General_CI_AS =
    NCHAR(0x1000) + NCHAR(0x102D) + NCHAR(0x102F)
    THEN 'Equivalent' ELSE 'Distinct'
    END

    gives results of "Distinct, Equivalent" because the first collation
    considers these strings distinct while the second doesn't.

    Without a COLLATE clause the default collation for your database will be
    used.

    Similarly, if a column is a primary key, then the collation used for
    that column will affect which strings are considered equivalent for the
    purpose of that key.

    SELECT * FROM fn_helpcollations() will list collations available on your
    system.
    Those found with SELECT * FROM fn_helpcollations() WHERE description
    LIKE '%bin%' are most likely to find these two strings to be distinct,
    but may go too far in this regard (being sensitive to all concerns of
    case, accent, etc).



    This archive was generated by hypermail 2.1.5 : Thu Nov 12 2009 - 04:19:00 CST