Re: Unicode in VFAT file system

Date: Fri Jul 21 2000 - 07:37:17 EDT

Hi Ken,

UCS-2 is pretty close to the same thing as UTF-16. The differences do not
apply here.

UCS-2 can be big-endian or little-endian. The rule is that BE is the
default. However, on Intel platforms, you shouldn't be surprised to see LE
everywhere: that's the architecture. Microsoft is saving two bytes for
every filename by not storing a BOM.

You should note that Microsoft *means* UCS-2LE (and UTF-16LE in more
modern systems) when they say "Unicode" (at least on Intel platforms).


1. Yes, it is perfectly valid.
2. There are no characters in the surrogate space just yet, so a black
square should be no surprise. Two black squares means that it's being
treated as UCS-2.
3. Filenames are, by definition in Windows-land, UPPERCASE in Western
European systems. Other scripts either don't have the concept of case or
weren't mucked with. This includes compatibility characters stored outside
the U+0000 to U+00FF range.



Addison P. Phillips Principal Consultant
Inter-Locale LLC
Los Gatos, CA, USA

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
Globalization Engineering & Consulting Services

On Thu, 20 Jul 2000, Ken Krugler wrote:

> Hi Unicoders,
> Recently I've had the dubious pleasure of delving into the details of
> the VFAT file system. For long file names, I thought it used UCS-2,
> but in looking at the data with a disk editor, it appears to be
> byte-swapping (little endian). I thought that UCS-2 was by definition
> big endian, thus I've got the following questions:
> 1. Could it be using UTF-16LE? I tried creating an entry with a
> surrogate pair, but the name was displayed with two black boxes on a
> Windows 2000-based computer, so I assumed that surrogates were not
> supported.
> 2. Is little-endian UCS-2 a valid encoding that I just don't know about?
> 3. And finally, why are file names case-insensitive for characters in
> the U-0000 to U-00FF range, but not for any other characters? OK,
> maybe I can guess at the answer to that one...
> Thanks,
> -- Ken
> Ken Krugler
> TransPac Software, Inc.
> <>
> +1 530-470-9200

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT