Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Hans Åberg via Unicode <>
Date: Tue, 16 May 2017 18:52:03 +0200

> On 16 May 2017, at 18:38, Alastair Houghton <> wrote:
> On 16 May 2017, at 17:23, Hans Åberg <> wrote:
>> HFS implements case insensitivity in a layer above the filesystem raw functions. So it is perfectly possible to have files that differ by case only in the same directory by using low level function calls. The Tenon MachTen did that on Mac OS 9 already.
> You keep insisting on this, but it’s not true; I’m a disk utility developer, and I can tell you for a fact that HFS+ uses a B+-Tree to hold its directory data (a single one for the entire disk, not one per directory either), and that that tree is sorted by (CNID, filename) pairs. And since it’s case-preserving *and* case-insensitive, the comparisons it does to order its B+-Tree nodes *cannot* be raw. I should know - I’ve actually written the code for it!
> Even for legacy HFS, which didn’t store UTF-16, but stored a specified Mac legacy encoding (the encoding used is in the volume header), it’s case sensitive, so the encoding matters.
> I don’t know what tricks Tenon MachTen pulled on Mac OS 9, but I *do* know how the filesystem works.

One could make files that differed by case in the same directory, and Mac OS 9 did not bother. Legacy HFS tended to slow down with many files in the same directory, so that gave an impression of a tree structure. The BSD filesystem at the time, perhaps the one that Mac OS X once supported, did not store files in a tree, but flat with redundancy. The other info I got on the Austin Group List a decade ago.
Received on Tue May 16 2017 - 11:52:27 CDT

This archive was generated by hypermail 2.2.0 : Tue May 16 2017 - 11:52:28 CDT