Re: Unicode and Kermit

From: Mark Davis (mark@macchiato.com)
Date: Wed Aug 11 1999 - 11:51:41 EDT


While the differences may not be significant for a given user or application, they
are there. It shows up where some processes expect and produce a BOM at the front
of files, and others don't.

1. Files (or fields) will not compare as identical. Such precise identity
comparisons is important for cases such as digital signatures.
2. There can be cases where it makes a difference in rendering (since that is the
purpose of a ZWNBSP!). For example, a word like re-iterate will generally break
after the hyphen; with a ZWNBSP after it, it will not. This is more important in
languages that don't use spaces, where ZWNBSP and ZWSP can be used to override the
result of dictionary word breaks.
3. If the file is converted into and out of a legacy encoding (without paying
attention to the ZWNBSP), a very common treatment is to substitute a special
character for any character that can't be represented. This ends up putting a SUB
or '?' character at the front of the file.

These are some reasons why, if you are using Unicode in an environment where BOM
might be used (e.g. UTF-16), you need to be aware of that fact. If the file is
tagged as UTF-16BE or UTF16-LE, then the BOM is not used and there is no ambiguity.

Mark

John Cowan wrote:

> Mark Davis wrote:
> >
> > > Always writing a BOM is a safe choice, because a BOM is semantically
> > > zero-width no-break space, which is essentially a no-op.
> > >
> >
> > This is not quite true: BOM is not quite a NO-OP; it does need to be removed
> > from a file. For example, f I split a file into two, then concatenate, the
> > result should be identical to the original--it isn't unless I remove the BOM.
>
> True. But what effect does the extra ZWNBSP have in such a case?
> Nearly none: the character is zero-width, does not affect breaking,
> etc. (If the file was broken between a base character and its
> combining character(s), then you may have a problem.)
>
> --
> John Cowan http://www.ccil.org/~cowan cowan@ccil.org
> Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
> Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
> -- Coleridge / Politzer



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:50 EDT