Re: Unicode in VFAT file system

From: addison@inter-locale.com
Date: Fri Jul 21 2000 - 10:11:46 EDT


Well...

There has always been a BOM in Unicode and it's there for a reason: to
indicate the byte order on different processors. There is an inherent BE
bias in Unicode. But this doesn't invalidate an LE view of the Universe.

Avoiding for the moment the word-parsing that Markus suggests, Unicode on
Microsoft platforms has always been LE (at least on Intel) and they have
called the encoding they use "UCS-2" (when they bothered with such
things: in the past they always called it "Unicode" as if it were the
*only* encoding). As Unicode has evolved, Microsoft products have become
more exact in this regard.

You'll never *hear* about "UCS-2LE"... since I just invented it to
describe what's going on. In non-standard usage, UCS-2 means "doesn't
support surrogates" and LE happens to be what PCs are using. Calling
something UTF-16 that doesn't support surrogates is bad, as far as I'm
concerned.

VFAT may support surrogates (AFAIK it does). Microsoft is usually quite
good at indicating UTF-16 support in their documentation.

>
> >3. Filenames are, by definition in Windows-land, UPPERCASE in Western
> >European systems.
>
> My understanding is that with DOS they were always upper-cased, but
> probably only for the Western European code pages. With VFAT, the
> file names are stored as-is, but checked for uniqueness using
> case-insensitivity (but only in the basic Latin and Latin-1
> supplement range).

Sure, but they haven't abandoned this behavior in more modern operating
systems. The upper-casing is done to support DOS compatibility, which is
important in a Microsoft networking environment.

thanks,

Addison



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT