Re: Byte-order markers question

From: Rick McGowan (Rick_McGowan@next.com)
Date: Thu Jun 05 1997 - 20:13:27 EDT


clarkcb@corp.sykes.com asked:

> suppose I have a file composed not of one large stream of text, but multiple

> separate strings, ie. strings contained within separate text boxes, such as
> a desktop publishing file, would then the byte-order indicator, which is
> also a indication that the text is Unicode as I understand it, be necessary
> at the front of each string? I would guess one of two options:
> 1. yes, it is required in front of each *separate* string
> 2. yes or no, depending on the application

I think the answer here is entirely up to your application. I would suppose,
however, that a big document would be encoded as a unit in one byte order or
the other. On the other hand, having every string-field within the document
also preceded by a byte order mark won't hurt, as long as the reader programs
are prepared to deal with it; e.g., in removing it when concatenating strings,
etc.

The byte order mark is most useful in plain text files. The standard tesxt
editor of the system I use here, for instance, can auto-detect plain text
Unicode files in either little or big-endian format based on the presence of
the BOM or its inverse. It does this once on a per-file basis by checking for
the byte order mark when it reds the file. But it's a plain-text consumer, so
it doesn't presume that the file contains any "block" structure.

So... the structure of your structure word processing file, or whatever, is
best determined by the type of reader program you expect. But, in my opinion,
you'd probably be best served in the long run, all around, by having the
entire file saved in one or the other byte order and mark it only once.

        Rick

 



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT