From: verdy_p (verdy_p@wanadoo.fr)
Date: Mon Dec 14 2009 - 19:10:46 CST
De : "Leo Broukhis" 
> On Mon, Dec 14, 2009 at 10:46 AM, John (Eljay) Love-Jensen
>  wrote:
> 
> > OS X requires filenames be normalized, which is close to be slightly
> > different from NFD. And for backwards compatibility, is rather stuck doing
> > what it does. (Apple's "HFS+NFD" variant predates NFD.)
> 
> What would be an example of the difference?
> 
> > This OS X requirement is enforced by the OS.
> >
> > Windows requires that filenames be normalized as NFC. This can cause all
> > sorts of havoc.
> 
> What kind of havoc? If the OS is to normalize file names, I consider
> NFC preferable, because it guarantees that a user-requested file name
> of an allowed length will still be of an allowed length after
> normalization.
> 
> > If Linux has embraced Unicode, I would be surprised that they have not also
> > established NFD or NFC as the required normalization.
> 
> The Unix way would be to have the normalization mode (NFC, NFD,
> HFS+NFD, none) a file system attribute specified during file system
> creation (or mounting, if the FS format does not support that flag).
Actually, these are not normalizations required by the OS, but required for conformance with a filesystem. Even OSX 
or Linux will apply NFC normalization when it writes files to a FAT32 filesystem (in addition they will also map 
filenames to short filenames using a legacy 8-bit OEM codepage... if they create short (8.3) names in addition to 
LFNs.
Other filesystems have also their own requirements (such as ISO9660 on CD/DVD, which is more restricted but still 
allows long file name extensions such as Joliet initially used on Windows, or other LFN conventions initially used 
on Linux or on MacOSX). As soon as these filesystems start being interchanged across OSes, all these OSes will 
borrow the associated conventions.
Unix and Linux actually do not enforce any normalization on UFS and NFS filesystems, so they consider filenames with 
different normalizations as being also distinct (with the exception of the NUL byte and the slash, and special 
filenames "." and ".." reserved), all filenames are just treated as binary octet streams, that can use any kind of 
encoding, and that are not even forced to use UTF-8 ; it is just assumed that the encoding will at least be 
compatible with 7-bit US-ASCII (to support the conventional filenames and directories like "/bin", "/etc", "/usr"... 
that are used in Unix shells and tools), so it can be any ISO 8859 encoding, or one of the many PC or MacOS 
codepages, or UTF-8) : the actual encoding is determined by each host and the applications running on it with their 
own locale settings.
The lack of encoding specifications and enforcement on legacy Unix filesystems has been the source of 
interoperabilty problems (including in FTP and later on HTTP when they started being used in URLs on the web, where 
also no normalization is possible, because the actual encoding is not specified).
This archive was generated by hypermail 2.1.5 : Mon Dec 14 2009 - 19:15:20 CST