Re: Overload (was Re: Text Editors and Canonical Equivalence (was Coloured diacritics))

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 09 2003 - 13:23:08 EST

  • Next message: Marco Cimarosti: "RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"

    On 09/12/2003 10:01, Mark Davis wrote:

    >>No, surely not. If the wcslen() function is fully Unicode conformant, it
    >>should give the same output whatever the canonically equivalent form of
    >>its input. That more or less implies that it should normalise its input.
    >>
    >>
    >
    >No, that is not a requirement of Unicode conformance.
    >
    >BTW, I must confess to an inability to keep up with the level of mail on this
    >list. There are so many things in these mails that are simply wrong, and
    >insufficient time for knowledgeable people to correct them. I would just caution
    >people to first consult the materials on the Unicode site (Standard, TRs, FAQs,
    >etc.), and take much of what is on this list with a quite sizable grain of salt.
    >
    >
    >
    Mark, I understand your problem with the level of mail. But, in this
    case, I have read the appropriate section of TUS 4.0 and quote it here
    to prove it, from p.59:

    > C9 A process shall not assume that the interpretations of two
    > canonical-equivalent character
    > sequences are distinct.
    > ...
    > • Ideally, an implementation would always interpret two
    > canonical-equivalent character
    > sequences identically. ...

    Perhaps my error is that I have raised (or is it lowered?) "ideally
    would" to "should". So let me rephrase what I said before:

    If the wcslen() function is fully Unicode conformant, ideally it would
    give the same output whatever the canonically equivalent form of its input.

    Surely that is what C9 is saying. Or is the issue about whether such a
    function is "a process"? I didn't say that conformance implies that a
    process should normalise its input (I accept that that is not true), but
    only that for this particular function, counting the length of a string,
    sensible results can be given only if the string is normalised, or at
    least transformed in some other way which removes distinctions between
    canonically equivalent forms (e.g. normalisation with some kinds of
    modified data).

    I am tacitly assuming at this point that the function is part of a
    general-purpose library for use by users who are not interested in the
    details of character coding etc. I can see that different considerations
    may apply for an internal function within a Unicode processing and
    rendering implementation.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 14:03:30 EST