Re: Unicode in VFAT file system

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jul 21 2000 - 15:55:59 EDT


At 07:14 AM 7/21/00 -0800, Peter_Constable@sil.org wrote:
>Why does it say there are three varieties when a 16-bit datum can only be
>serialised in two orders? If the scheme UTF-16 doesn't have a BOM, isn't it
>just one of the other two? When it does have a BOM, it can still be
>serialised in two ways, so aren't there four schemes - 2 serialisations x
>ħBOM? I barely manage to make sense of forms and schemes and then they
>confuse me with this stuff!

The problem is that the labels where invented to tag data streams, not to
'label' the result of autodetection. As you point out there are 4 results
of auto-detection:

UTF-16, no BOM
UTF-16, no BOM, but arriving in reverse byte order (for my processor)
UTF-16 with BOM
UTF-16 with BOM, arriving in reverse byte order (for my processor)

When I send a data stream, I have these conditions

1) don't know byte order
a) send it out bare
b) send it out with BOM

2) do know byte order
a) send it out with BOM, but don't tell recipient the byte order
b) don't use bom, and tell recipient the byte order in an external label

labels UTF-16BE and UTF-16LE are to be used for case 2b *only*.
label UTF-16 is required for 1a and b and 2a.

The hypothetical case of telling the recipient the byte order *and* using
the BOM at the same time is not supported.

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT