Re: Unicode on a website

From: Doug Ewell (dewell@compuserve.com)
Date: Sat Sep 23 2000 - 18:12:23 EDT


David Starner <dvdeug@x8b4e516e.dhcp.okstate.edu> wrote:

>> First is there a standard for implementing SCSU in browsers? If not
>> then we need to do that first.
>
> Huh? It's a text encoding. You should keep everything through the
> Content-Type ASCII (that only uses LF, CR and HT and graphic
> characters, so it's also legal SCSU), and properly label the
> Content-Type, just like you'd do with any non-ASCII encoding.

I wonder if there is some confusion here between SCSU and another
encoding form. SCSU uses all 256 bytes, not just printable ASCII and
CR/LF/HT. ASCII text is encoded as ASCII, and Latin-1 text is encoded
as Latin-1, but beyond that you will definitely have bytes in the 0x00
through 0x1F range. Check Unicode Technical Report #6 for more
information.

> The only question I have is whether you put a BOM/signature on it.
> UTF-16 HTML usually goes out with a BOM. Does everything support
> UTF-8 HTML with a BOM?

Regarding SCSU, the Technical Report does describe a possible signature
sequence (0E FE FF). Since SCSU can be difficult to auto-detect, if I
were developing a SCSU-aware browser I would make it recognize that
particular sequence as an SCSU signature. But at present, you cannot
depend on it being there. SCSU is still not widely used, so there is
no such word as "usually" when talking about SCSU implementations.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT