David Starner <firstname.lastname@example.org>, another brave SCSU
> I'm implementing SCSU, and I was curious about the signature for SCSU.
> The UTR specifies 10 different signatures and then labels 0E FE FF as
> recommended. Is it acceptable for a decoder to interpret an initial 0E
> FE FF as the signature and decode the others as ZWNBSP, or must it
> iterpret all of them as signatures?
U+FEFF can only be safely interpreted as a BOM (or signature) if it
is the first character of a file or stream. Otherwise it should be
interpreted as a ZWNBSP (at least unless and until Unicode officially
deprecates the use of U+FEFF as ZWNBSP). This is true regardless of
the encoding scheme, be it SCSU or something else.
An SCSU compressor may choose to encode all instances of U+FEFF, not
just the BOM, in the form 0E FE FF. Or it may use another of the
approaches mentioned in the TR. Mine happens to use an SD3 tag (1B A5
FF) for non-initial U+FEFF since dynamic window 3 is the first one I
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT