RE: Designing a multilingual web site

Date: Thu Jul 20 2000 - 06:41:40 EDT

These bytes are the FIRST bytes in a (Unicode-encoded) text file. They are
not guaranteed to be there, but if you are writing a program that reads
and writes text files, you should look for and be prepared to handle them.

Note that Byte Order Marks (BOM) are associated with files. Database
fields, for example, will not start with a BOM (although it is not illegal
to put one there).

Most programs that handle Unicode text files will not display the
characters to you: they hide the fact that they are there. To actually
view the bytes, you will need a program that either doesn't understand
Unicode or a binary file editor (on Windows, try using "debug"--it'll
show you the byte values).

Hope this helps,


Addison P. Phillips Principal Consultant
Inter-Locale LLC
Globalization Engineering & Consulting Services

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)

On Wed, 19 Jul 2000, Munzir Taha wrote:

> >notepad always saves Unicode-encoded files with the appropriate signature
> byte sequence,
> >like most other Microsoft-apps and many other well-behaved applications.
> >They are the first 2 to 4 bytes in the text file, encode U+feff in the
> particular encoding
> >scheme, and are as follows:
> >utf-8: ef bb bf
> >utf-16be: fe ff
> >utf-16le: ff fe
> >utf-32be: 00 00 fe ff
> >utf-32le: ff fe 00 00 (check before utf-16le!)
> >scsu: 0e fe ff (unfortunately rather rarely used)
> Sorry for being a dummy about this. But I can't understand where these bytes
> lie. How can I see them or check them? If it's a long subject to be
> explained, please refer me to where I can get more info about this subject,
> will you?
> __________________________________________________
> Do You Yahoo!?
> Talk to your friends online with Yahoo! Messenger.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT