Re: Medievalist ligature character in the PUA

From: verdy_p (verdy_p@wanadoo.fr)
Date: Mon Dec 14 2009 - 19:10:46 CST

  • Next message: Asmus Freytag: "Re: Medievalist ligature character in the PUA"

    De : "Leo Broukhis"
    > On Mon, Dec 14, 2009 at 10:46 AM, John (Eljay) Love-Jensen
    > wrote:
    >
    > > OS X requires filenames be normalized, which is close to be slightly
    > > different from NFD. And for backwards compatibility, is rather stuck doing
    > > what it does. (Apple's "HFS+NFD" variant predates NFD.)
    >
    > What would be an example of the difference?
    >
    > > This OS X requirement is enforced by the OS.
    > >
    > > Windows requires that filenames be normalized as NFC. This can cause all
    > > sorts of havoc.
    >
    > What kind of havoc? If the OS is to normalize file names, I consider
    > NFC preferable, because it guarantees that a user-requested file name
    > of an allowed length will still be of an allowed length after
    > normalization.
    >
    > > If Linux has embraced Unicode, I would be surprised that they have not also
    > > established NFD or NFC as the required normalization.
    >
    > The Unix way would be to have the normalization mode (NFC, NFD,
    > HFS+NFD, none) a file system attribute specified during file system
    > creation (or mounting, if the FS format does not support that flag).

    Actually, these are not normalizations required by the OS, but required for conformance with a filesystem. Even OSX
    or Linux will apply NFC normalization when it writes files to a FAT32 filesystem (in addition they will also map
    filenames to short filenames using a legacy 8-bit OEM codepage... if they create short (8.3) names in addition to
    LFNs.

    Other filesystems have also their own requirements (such as ISO9660 on CD/DVD, which is more restricted but still
    allows long file name extensions such as Joliet initially used on Windows, or other LFN conventions initially used
    on Linux or on MacOSX). As soon as these filesystems start being interchanged across OSes, all these OSes will
    borrow the associated conventions.

    Unix and Linux actually do not enforce any normalization on UFS and NFS filesystems, so they consider filenames with
    different normalizations as being also distinct (with the exception of the NUL byte and the slash, and special
    filenames "." and ".." reserved), all filenames are just treated as binary octet streams, that can use any kind of
    encoding, and that are not even forced to use UTF-8 ; it is just assumed that the encoding will at least be
    compatible with 7-bit US-ASCII (to support the conventional filenames and directories like "/bin", "/etc", "/usr"...
    that are used in Unix shells and tools), so it can be any ISO 8859 encoding, or one of the many PC or MacOS
    codepages, or UTF-8) : the actual encoding is determined by each host and the applications running on it with their
    own locale settings.

    The lack of encoding specifications and enforcement on legacy Unix filesystems has been the source of
    interoperabilty problems (including in FTP and later on HTTP when they started being used in URLs on the web, where
    also no normalization is possible, because the actual encoding is not specified).



    This archive was generated by hypermail 2.1.5 : Mon Dec 14 2009 - 19:15:20 CST