Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 from Hans Åberg via Unicode on 2017-05-16 (Unicode Mail List Archive)

From: Hans Åberg via Unicode <unicode_at_unicode.org>
Date: Tue, 16 May 2017 18:23:51 +0200

> On 16 May 2017, at 18:13, Alastair Houghton <alastair_at_alastairs-place.net> wrote:
>
> On 16 May 2017, at 17:07, Hans Åberg <haberg-1_at_telia.com> wrote:
>>
>>>>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ...
>>>>
>>>> The filesystem directory is using octet sequences and does not bother passing over an encoding, I am told. Someone could remember one that to used UTF-16 directly, but I think it may not be current.
>>>
>>> No, that’s not true. All three of those systems store UTF-16 on the disk (give or take).
>>
>> I am not speaking about what they store, but how the filesystem identifies files.
>
> Well, quite clearly none of those systems treat the UTF-16 strings as binary either - they’re case insensitive, so how could they? HFS+ even normalises strings using a variant of a frozen version of the normalisation spec.

HFS implements case insensitivity in a layer above the filesystem raw functions. So it is perfectly possible to have files that differ by case only in the same directory by using low level function calls. The Tenon MachTen did that on Mac OS 9 already.
Received on Tue May 16 2017 - 11:24:10 CDT

This archive was generated by hypermail 2.2.0 : Tue May 16 2017 - 11:24:10 CDT