Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 09 2003 - 14:13:33 EST

Next message: Philippe Verdy: "RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"

Previous message: Michael Everson: "Re: [OT]"
In reply to: Marco Cimarosti: "RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Next in thread: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Reply: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Reply: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 09/12/2003 10:22, Marco Cimarosti wrote:

>Peter Kirk wrote:
>
>
>>>So, should n equal four or five? The answer would appear to
>>>depend on whether or not the source file was saved in NFC
>>>or NFD format.
>>>
>>>
>>>
>>No, surely not. If the wcslen() function is fully Unicode
>>conformant, it should give the same output whatever the
>>canonically equivalent form of its input.
>>That more or less implies that it should normalise
>>its input.
>>
>>
>
>Standards and fantasy are both good things, provided you don't mix them up.
>
>The "wcslen" has nothing whatsoever to do with the Unicode standard, but it
>has all to do with the *C* standard. And, according to the C standard,
>"wcslen" must simply count the number "wchar_t" array elements from the
>location pointed to by its argument up to the first "wchar_t" element whose
>value is L'\0'. Full stop.
>
>
>
OK, as a C function handling wchar_t arrays it is not expected to
conform to Unicode. But if it is presented as a function available to
users for handling Unicode text, for determining how many characters (as
defined by Unicode) are in a string, it should conform to Unicode,
including C9.

> ...
>
>>The Unicode standard does allow for special display modes in
>>which the exact underlying string, including control
>>characters, is made visible.
>>
>>
>
>Can you please cite the passage where the Unicode standard would not allow
>this?
>
>
>
TUS 4.0 p.60 (part of C9):

> Even processes that normally do not distinguish between
> canonical-equivalent character sequences can have reasonable exception
> behavior. Some examples of this behavior include ... “Show Hidden
> Text” modes that reveal memory representation structure; ...

Somewhere else I think there is more detail.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Philippe Verdy: "RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Previous message: Michael Everson: "Re: [OT]"
In reply to: Marco Cimarosti: "RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Next in thread: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Reply: Doug Ewell: "Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Reply: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacriti cs)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 15:13:28 EST