Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Alastair Houghton via Unicode <unicode_at_unicode.org>
Date: Tue, 16 May 2017 17:38:36 +0100

On 16 May 2017, at 17:23, Hans Åberg <haberg-1_at_telia.com> wrote:
>
> HFS implements case insensitivity in a layer above the filesystem raw functions. So it is perfectly possible to have files that differ by case only in the same directory by using low level function calls. The Tenon MachTen did that on Mac OS 9 already.

You keep insisting on this, but it’s not true; I’m a disk utility developer, and I can tell you for a fact that HFS+ uses a B+-Tree to hold its directory data (a single one for the entire disk, not one per directory either), and that that tree is sorted by (CNID, filename) pairs. And since it’s case-preserving *and* case-insensitive, the comparisons it does to order its B+-Tree nodes *cannot* be raw. I should know - I’ve actually written the code for it!

Even for legacy HFS, which didn’t store UTF-16, but stored a specified Mac legacy encoding (the encoding used is in the volume header), it’s case sensitive, so the encoding matters.

I don’t know what tricks Tenon MachTen pulled on Mac OS 9, but I *do* know how the filesystem works.

Kind regards,

Alastair.

--
http://alastairs-place.net
Received on Tue May 16 2017 - 11:38:52 CDT

This archive was generated by hypermail 2.2.0 : Tue May 16 2017 - 11:38:52 CDT