Re: BOM ambiguity? from Asmus Freytag on 2012-07-13 (Unicode Mail List Archive)

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Fri, 13 Jul 2012 14:42:38 -0700

A) treating NUL as ignorable is really deep legacy. Totally no longer
appropriate for modern data.
B) there are many Unicode character codes with leading or trailing or
other NUL bytes, so UTF-16 and UTF-32 cannot be exchanged under the
assumption of "NUL is ignorable"

A./

On 7/13/2012 2:16 PM, Philippe Verdy wrote:
> Null characters are almost always avoided in interchanged plain texts.
> This is not a practicle problem. The use of nulls as significant
> characters is extremely exceptional, as they almost always require an
> envelope format to specify data lengths. this envelope format is in a
> file that is not plain-text by itself.
>
> 2012/7/13 Stephan Stiller <stephan.stiller_at_gmail.com>:
>> As an aside to the BOM discussion - something I've always been meaning to
>> ask.
>>
>> So there is a BOM-ambiguity when a file starts with
>> FF FE
>> and then a couple of U+0000 characters, yes? Because this could be either
>> UTF-16 or UTF-32 under little-endianness. Has this been pointed out and
>> discussed beforehand?
>>
>> Because the set of BOMs in different encodings don't constitute a
>> prefix-free code.
>>
>> Stephan
>>
>>
>
Received on Fri Jul 13 2012 - 16:44:07 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 13 2012 - 16:44:07 CDT