Re: Unicode in VFAT file system

From: Peter_Constable@sil.org
Date: Fri Jul 21 2000 - 11:26:09 EDT


>As a serialization, UTF-16 has three forms: UTF-16, UTF-16BE, and
UTF-16LE. The
>first is with (optionally) a BOM, and the others without.

I know this is what the Standard dictates, and I think I understand why,
but it doesn't make complete sense to the novice trying to find his/her
way:

<novice attitude=pondering&confused&frustrated>
Why does it say there are three varieties when a 16-bit datum can only be
serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it
just one of the other two? When it does have a BOM, it can still be
serialised in two ways, so aren't there four schemes - 2 serialisations x
ħBOM? I barely manage to make sense of forms and schemes and then they
confuse me with this stuff!
</novice>

Don't we really mean that there are three approved ways in which the
encoding scheme of a stream can be labelled? Wouldn't it be clearer to say
that UTF-16 has two serialisations (not forms! since were talking about
schemes), and that the encoding scheme of a stream can be labelled in one
of three ways: UTF-16, UTF-16BE and UTF-16LE?

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT