Re: Unicode Normalization on MS-Windows

From: David Starner (dvdeug@ispwest.com)
Date: Mon Apr 28 2003 - 12:20:35 EDT

  • Next message: Carl W. Brown: "Title Case (Was: [OT] multilingual support in MS products"

    On Mon, Apr 28, 2003 at 08:27:36AM -0700, Jane Liu wrote:
    > neither Microsoft Windows nor those popular UNIX
    > systems (AIX, Solaris, HP-UX) currently supply the explicit support
    > of Unicode normalization at the encoding/converison level
    [...]
    > If that's true, can we conclude that in order to maintain the
    > transperancy and round-trip safty between application and OS, the
    > application should not use normalization?

    On Posix systems, it's wrong to even treat filenames as Unicode strings;
    according to the standard, they are null-terminated byte strings that
    can't include 0x2F. From that perspective, filenames are a unique type
    that musn't be mangled by any Unicode operation, including conversion
    between UTF-8 and UTF-16. Of course, users expect to see them as
    strings; one solution for this that also works for the normalization
    case is to keep a table of filenames and the Unicode version thereof,
    which may be normalized.

    -- 
    David Starner - dvdeug@email.ro
    Ic sæt me on anum leahtrice, ða com heo and bát me!
    


    This archive was generated by hypermail 2.1.5 : Mon Apr 28 2003 - 13:53:54 EDT