Re: Unicode in VFAT file system

From: Peter_Constable@sil.org
Date: Fri Jul 21 2000 - 15:16:04 EDT


On 07/21/2000 01:24:02 PM <jcowan@reutershealth.com> wrote:

>> Why does it say there are three varieties when a 16-bit datum can only
be >
>serialised in two orders?
>
>The simplest way to think about it is to remember that a MIME charset is
meant
>to provide *minimal* information for the receiver to convert bytes into
>characters. If the receiver gets FF FE 01 02, then it *must* be
interpreted as
>follows depending on the charset...

I understand that these determine different interpretations of a stream in
those circumstances. But, the explanation "the encoding form UTF-16 has
three encoding schemes..." doesn't appear in the context of a discussion of
MIME charsets on anything like that; it's just a context-free statement. If
you read D33 and D34, then read D35

<quote>
UTF-16 is the Unicode Transformation Format that serializes a Unicode value
as a sequence of two bytes, in either big-endian or little-endian format...
</quote>

someone can easily be left thinking, "so it's one of the previous two, but
they seem to be saying it's a third, which doesn't make sense". That's
because the labels UTF-16, UTF-16BE and UTF-16LE aren't about what's
actually in the text stream but rather are about what is explicitly *said*
about what's in the text stream. Yet the definitions never make that clear.
That's my point.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT