RE: UTF-8N?

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Sat Jun 24 2000 - 12:30:13 EDT


The term 'application' is used for so many different things it is
practically useless. Software consists of many levels, and the line between
system and application depends on ones point of view and personal
inclinations.

We have here two meanings, two levels of software. One level gets the
encoded data stream from a lower level, and may or may not normalize it to
some uniform, encoding independent, format, such as UCS-16 or UTF-8, before
passing it on to the next level.

The other level receives the uniform format and processes it. HTML, for
example, actually specifies this conceptual model but not all browsers
implement it. These applications never see the BOM, although they may be
able to see the specification of the source encoding.

Jony

> -----Original Message-----
> From: Robert A. Rosenberg [mailto:bob.rosenberg@digitscorp.com]
> Sent: Friday, June 23, 2000 10:36 PM
> To: Unicode List
> Cc: Unicode List
> Subject: Re: UTF-8N?
>
>
> At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote:
> >Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP,
> >programs that *expect* UTF-8 instead of SBCS will be able to throw away
> >an initial U+FEFF with even greater confidence. It may even be possible
> >for operating system developers to build this in at the OS level: open
> >a UTF-8 text file; read characters; if the very first character in the
> >file was U+FEFF then eat it. Applications would never even see it.
> >How cool would that be?
>
> It would be very UNCool unless the application can tell the operating
> system that it wants this done for it. Otherwise it will have no way of
> KNOWING that the edited stream that the operating system is passing it IS
> UTF-8 (and was so identified by the deleted BOM) and not some other
> character-set that the program will fail on if it tries to parse it as
> UTF-8. Letting the application SEE the BOM acts as a sanity check.
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT