Re: Medievalist ligature character in the PUA

From: Peter Edberg (
Date: Mon Dec 14 2009 - 14:39:49 CST

  • Next message: Julian Bradfield: "Re: Medievalist ligature character in the PUA"

    On Dec 14, 2009, at 12:26 PM, Julian Bradfield wrote:

    > On 2009-12-14, Peter Edberg <> wrote:
    >> On Dec 14, 2009, at 10:30 AM, Leo Broukhis wrote:
    >>> This problem is with us already (on Apple systems, of all things).
    >>> MacOS X decomposes Cyrillic Й and Ё in file names and treats файл and
    >>> файл as the same file name
    >> Which seems appropriate, since they are canonically equivalent.
    >>> Windows and Linux don't.
    >> So the question is, why not?
    > For the very obvious reason that the system locale may not be utf-8.
    > I'm sure someone can come up with an example of two utf-8 canonically
    > equivalent strings that both make (different) sense in some other
    > encoding.

    On Dec 14, 2009, at 11:11 AM, Leo Broukhis wrote:
    > A file system is a map of tuples of "short" strings of non-zero,
    > non-solidus bytes to potentially long strings of arbitrary bytes. Why
    > should there be any storage-level assumption about the text property
    > of any of these strings?

    The desirable behavior I am describing refers, of course, to behavior at a higher level - a level at which the the file "name" is already explicitly specified to be text in Unicode using a specified encoding scheme (e.g. UTF16), as is true in Apple's HFS Extended volume format, NTFS, and other volume formats. At that level I think it is reasonable to enforce canonical equivalence.

    -Peter E

    This archive was generated by hypermail 2.1.5 : Mon Dec 14 2009 - 14:41:03 CST