Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: jcowan@reutershealth.com
Date: Tue Dec 09 2003 - 13:16:05 EST

  • Next message: Anupam Agarwal: "Unsubscribe"

    Peter Kirk scripsit:

    > No, surely not. If the wcslen() function is fully Unicode conformant, it
    > should give the same output whatever the canonically equivalent form of
    > its input.

    Not so. Remember, the conformance requirement is not that a process can't
    distinguish between canonically equivalent strings (otherwise a normalizer
    would be impossible; it wouldn't know whether to normalize or not!) but that
    a process can't assume that *other* processes will distinguish between
    canonically equivalent strings. Equally, it can't assume that the other
    process will fail to distinguish them, either.

    In an environment in which C wide characters are Unicode characters, then
    wcslen returns the number of distinct characters in the literal string.
    How many characters it contains depends on how many were placed in the
    source file by the author and what, if anything, has happened to the source
    file since.

    -- 
    As you read this, I don't want you to feel      John Cowan 
    sorry for me, because, I believe everyone       jcowan@reutershealth.com
    will die someday.    -- From a Nigerian-type    http://www.reutershealth.com
                            scam spam I got         http://www.ccil.org/~cowan
    


    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 13:52:39 EST