Re: UTF-8 text samples

From: Mark Davis (marked@best.com)
Date: Fri Oct 16 1998 - 22:34:13 EDT


While it may be recommended, the BOM is not required.

Better, in this case, would be if the file were in HTML format with the
appropriate charset.

Mark

Murray Sargent wrote:

> Donald's UTF-8 file should begin with a UTF-8 BOM in order to identify it as
> a UTF-8 encoded file. The starting bytes should be 0xEF 0xBB 0xBF. These
> bytes are discarded when reading the file in and added when writing the file
> out.
>
> Thanks
> Murray
>
> > -----Original Message-----
> > From: Donald Page [SMTP:donaldp@sco.com]
> > Sent: Thursday, October 15, 1998 10:25 AM
> > To: Unicode List
> > Subject: Re: UTF-8 text samples
> >
> > The above attachment should contain all of the Minimum European Subset
> > encoded as UTF-8. I created it for my own testing, but feel free to use
> > it.
> >
> > Donald
> >
> > On Thu, 15 Oct 1998, Frank da Cruz wrote:
> >
> > > Can anybody tell me where to find some UTF-8 text samples? Preferably
> > > containing mainly characters from the U+0000 through U+27FF range.
> > >
> > > Thanks!
> > >
> > > - Frank
> > > << File: >>

--
business: medavis2@us.ibm.com, mark@unicode.org
personal: mark@macchiato.com, http://www.macchiato.com
--



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT