Re: UTF-8, U+0000 and Software Development (was: Re: New UTF-8 decoder stress test file)

From: Martin J. Duerst (duerst@w3.org)
Date: Mon Sep 27 1999 - 01:58:58 EDT


At 14:13 99/09/26 -0700, Karl Pentzlin wrote:
> -----Urspr$B—O(Jgliche Nachricht-----
> Von: Paul Dempsey (Exchange) <paulde@Exchange.Microsoft.com>
> An: 'Karl Pentzlin' <karl-pentzlin@acssoft.de>; Unicode List
> <unicode@unicode.org>
> Gesendet: Sonntag, 26. September 1999 21:11
> Betreff: RE: UTF-8, U+0000 and Software Development (was: Re: New UTF-8
> decoder stress test file)
>
>
> > Using UTF-8 to represent a 0 byte without 0-valued bytes is misusing UTF-8
> > (at least for text interchange).
> >
> > ...
> >
> > I've written quite a lot of text-processing code in C/C++ that handles
> > embedded NUL characters. There's nothing intrinsic to the language that
> > makes it especially difficult. I just don't use much of the standard ISO C
> > library.
> That is (somewhat exaggerated), to conform to one standard (UTF-8 encoding
> U+0000 strictly by a 0 byte), you decide against another standard (the ISO
> standard C libraries) - a standard which also was made for interchange,
> namely for program source interchange between different operating systems.

No. UTF-8 requires to use a 0 byte for 0 just so that it can work together
with C libraries. Nobody every asked for a way to encode the 0 character
in ASCII as anything else that a 0 byte, so why should this suddenly
be necessary for UTF-8?

Regards, Martin.

#-#-# Martin J. Du"rst, World Wide Web Consortium
#-#-# mailto:duerst@w3.org http://www.w3.org



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT