Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 09 2003 - 14:13:33 EST

  • Next message: Philippe Verdy: "RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"

    On 09/12/2003 10:22, Marco Cimarosti wrote:

    >Peter Kirk wrote:
    >
    >
    >>>So, should n equal four or five? The answer would appear to
    >>>depend on whether or not the source file was saved in NFC
    >>>or NFD format.
    >>>
    >>>
    >>>
    >>No, surely not. If the wcslen() function is fully Unicode
    >>conformant, it should give the same output whatever the
    >>canonically equivalent form of its input.
    >>That more or less implies that it should normalise
    >>its input.
    >>
    >>
    >
    >Standards and fantasy are both good things, provided you don't mix them up.
    >
    >The "wcslen" has nothing whatsoever to do with the Unicode standard, but it
    >has all to do with the *C* standard. And, according to the C standard,
    >"wcslen" must simply count the number "wchar_t" array elements from the
    >location pointed to by its argument up to the first "wchar_t" element whose
    >value is L'\0'. Full stop.
    >
    >
    >
    OK, as a C function handling wchar_t arrays it is not expected to
    conform to Unicode. But if it is presented as a function available to
    users for handling Unicode text, for determining how many characters (as
    defined by Unicode) are in a string, it should conform to Unicode,
    including C9.

    > ...
    >
    >>The Unicode standard does allow for special display modes in
    >>which the exact underlying string, including control
    >>characters, is made visible.
    >>
    >>
    >
    >Can you please cite the passage where the Unicode standard would not allow
    >this?
    >
    >
    >
    TUS 4.0 p.60 (part of C9):

    > Even processes that normally do not distinguish between
    > canonical-equivalent character sequences can have reasonable exception
    > behavior. Some examples of this behavior include ... “Show Hidden
    > Text” modes that reveal memory representation structure; ...

    Somewhere else I think there is more detail.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 15:13:28 EST