Fwd: Re: Byte Order Marks

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Apr 19 2001 - 17:00:33 EDT


>Date: Thu, 19 Apr 2001 12:59:43 -0700
>To: Tomas McGuinness <tomas.mcguinness@cmg.nl>
>From: Asmus Freytag <asmusf@ix.netcom.com>
>Subject: Re: Byte Order Marks
>
>At 02:58 PM 4/19/01 +0200, you wrote:
>>If its absent is it safe to assume any particular order (i.e. Big or
>>Little Endian?)

The default order is Big endian, but I wouldn't call that a 'safe'
assumption. In the most general case I would attempt an autorecognition in
the unlabelled case. Where a particular protocol's specification reinforces
that the default order SHALL apply for the unlabelled case, the assumption
becomes that much stronger, of course.

A./

PS: as an aside: the SCSU encoder can be used to do this form of
autorecognition. If text shows much better compression in one byte order
than the other, that byte order is overwhelmingly likely to be the true
one. The exception would be strings of pure Han ideographs. For these it's
necessary to



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT