RE: MS Windows and Unicode 4.0 ?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 01 2003 - 13:39:56 EST

  • Next message: Michael Everson: "RE: Oriya: mba / mwa ?"

    Michael (michka) Kaplan writes:
    > I would not expect Windows (whose most recent shipping version shipped
    > before Unicode 4.0 was released) to support 4.0 properties and
    > such. But at
    > the same time, if you have fonts and build a keyboard you can support any
    > number of 4.0-only scripts.

    Isn't case folding standardized long before Unicode 4.0 ?
    Well, the Windows case mappings for its NTFS filesystem predates Unicode,
    and I think that Microsoft wants to avoid the nightmare of filesystems
    migration. But I think that a NTFS filesystem should track the Unicode
    version it was created with, so that the filesystem driver can adapt to the
    set of folding rules supported on this system.

    The other option would be to propose an option in CHKDSK to find files in
    the same directory whose name would collide if new case folding rules were
    applied. CHKDSK could propose to either list them (let the user choose which
    name to keep, and which file must be renamed). If there's no conflict in a
    given directory, it could be marked to support the newer Unicode rules.

    There's an interesting question with FAT32: it was designed after NTFS to
    add Unicode and LFN support on top of FAT16 and when Unicode was already
    publishing standard case folding rules. I can't believe that Microsoft chose
    for its LFN directory extensions to use the same folding rules as those used
    in NTFS. May be what is wanted here is to maximize the compaitibility of
    FAT32 with NTFS, even if NTFS has some defects.

    For now we have to live with the past! I'm quite sure that lowercase Sharp-S
    (ess-tzett) and double lowercase s are both used on German file-systems.
    This is even the case on FAT filesystems with which both FAT32 and NTFS must
    keep some compatibility (for short file names), as it uses the OEM codepage
    (CP437 or CP850 in Germany) where Sharp-S has been allowed since long and
    made distinct from double s.

    If Windows was changed to use case folding of sharp-s to double s, then it
    would have problems to read filesystems (including floppies which use FAT12
    with the same naming constraints as FAT16) containing short filenames.
    However this is mitigated by the fact that FAT12 and FAT16 have always been
    ambiguous about the effective OEM charset they were encoded with.

    Rremember the issues when migrating from Windows 3.x to Windows 95, because
    of legacy filesystems created with ambiguous OEMCP-only short names, and
    SCANDISK had also to be used for some time because they were applications
    expecting OEMCP-encoded names that were conflicting sometimes between CP437
    and CP850. Even after the upgrade, the current codepage of the running app
    is still creating encoding conflicts detected later by CHKDSK or SCANDISK
    when OEMCP encoded short names do not match their Unicode encoded LFN names.
    SCANDISK proposes to trust the Unicode LFN name and alter the short name to
    reflect in the current OEM codepage the effective Unicode name.

    Even today there are such errors when, for some reason like virus infection,
    the AUTOEXEC.BAT is not run at startup to fix the codepage, so that Windows
    will start using short names in FAT filesystems with a new OEM codepage
    distinct from the OEM codepage with which the filesystem was previously
    used.

    Thanks, going to Unicode has fixed all this: short names are retained for
    compatibility. However FAT32 filesystems are still trying to open first the
    file converted to short names in the current OEMCP before trying the LFN
    name in Unicode. As FAT32 is definitely not dead or deprecated in favor of
    NTFS (for some performance reasons, forgetting the stronger security and
    stability of NTFS face to system crashes), we still have an issue in Windows
    2000/XP/2003...

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Mon Dec 01 2003 - 14:41:18 EST