From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 09 2003 - 14:13:33 EST
On 09/12/2003 10:22, Marco Cimarosti wrote:
>Peter Kirk wrote:
>
>
>>>So, should n equal four or five? The answer would appear to
>>>depend on whether or not the source file was saved in NFC
>>>or NFD format.
>>>
>>>
>>>
>>No, surely not. If the wcslen() function is fully Unicode
>>conformant, it should give the same output whatever the
>>canonically equivalent form of its input.
>>That more or less implies that it should normalise
>>its input.
>>
>>
>
>Standards and fantasy are both good things, provided you don't mix them up.
>
>The "wcslen" has nothing whatsoever to do with the Unicode standard, but it
>has all to do with the *C* standard. And, according to the C standard,
>"wcslen" must simply count the number "wchar_t" array elements from the
>location pointed to by its argument up to the first "wchar_t" element whose
>value is L'\0'. Full stop.
>
>
>
OK, as a C function handling wchar_t arrays it is not expected to
conform to Unicode. But if it is presented as a function available to
users for handling Unicode text, for determining how many characters (as
defined by Unicode) are in a string, it should conform to Unicode,
including C9.
> ...
>
>>The Unicode standard does allow for special display modes in
>>which the exact underlying string, including control
>>characters, is made visible.
>>
>>
>
>Can you please cite the passage where the Unicode standard would not allow
>this?
>
>
>
TUS 4.0 p.60 (part of C9):
> Even processes that normally do not distinguish between
> canonical-equivalent character sequences can have reasonable exception
> behavior. Some examples of this behavior include ... “Show Hidden
> Text” modes that reveal memory representation structure; ...
Somewhere else I think there is more detail.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 15:13:28 EST