Re: Representing Unix filenames in Unicode

From: Hans Aberg (
Date: Mon Nov 28 2005 - 14:38:00 CST

    On 28 Nov 2005, at 20:49, Neil Harris wrote:

    > The set of ASCII strings is a proper subset of the set of UTF-8
    > strings, so no information would need to be stored about which of
    > those coding was being used.

    So it would seem, but I think that UNIX under some circumstances,
    though I do not remember which, needs to know that it is ASCII and
    not anything else. But I'll guess, one shall what works best see when
    making a UTF-8 enabled UNIX.

    > Now, ISO 8859-1, that's a different matter -- I suppose you could
    > still use the property that _almost all_ non-pure-ASCII ISO 8859-1
    > natural language strings are not also valid UTF-8 strings for
    > backwards compatibility, and ditto for most other fixed 8-bit
    > encodings, but I certainly wouldn't be willing to trust my
    > filesystem to this sort of hack.

    I'll pass on this one. There are different approaches, mixed
    encodings or single UTF-8, though.

       Hans Aberg

