Re: UTF-8N?

From: Robert A. Rosenberg (bob.rosenberg@digitscorp.com)
Date: Fri Jun 23 2000 - 13:50:21 EDT


At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote:
>Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP,
>programs that *expect* UTF-8 instead of SBCS will be able to throw away
>an initial U+FEFF with even greater confidence. It may even be possible
>for operating system developers to build this in at the OS level: open
>a UTF-8 text file; read characters; if the very first character in the
>file was U+FEFF then eat it. Applications would never even see it.
>How cool would that be?

It would be very UNCool unless the application can tell the operating
system that it wants this done for it. Otherwise it will have no way of
KNOWING that the edited stream that the operating system is passing it IS
UTF-8 (and was so identified by the deleted BOM) and not some other
character-set that the program will fail on if it tries to parse it as
UTF-8. Letting the application SEE the BOM acts as a sanity check.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT