From: Doug Ewell (firstname.lastname@example.org)
Date: Wed Dec 15 2004 - 10:38:55 CST
Marcin 'Qrczak' Kowalczyk <qrczak at knm dot org dot pl> wrote:
>> OBSERVATION - Requirement (4) is not met absolutely, however,
>> the probability of the UTF-8 encoding of this sequence occuring
>> "accidently" at an arbitrary offset in an arbitrary octet stream
>> is approximately one in 2^384;
> Assuming that the distribution of sequences of characters is uniform.
> But it's not! As soon as you start using this encoding somewhere,
> the probability of appearing of this sequence raises dramatically.
> If you convert UTF-8 -> UTF-32 using modified rules, and UTF-32 ->
> UTF-8 using standard rules, then you get this sequence without waiting
> 2^340 years.
Well, of course. Any sequence of events, chosen for a special purpose
on the basis that it is unlikely to occur naturally, will now occur
"naturally" much more often(under the new definition of "naturally").
One of the early rationales for the U+FEFF signature/BOM in UTF-16 was
that the sequences <0xFF, 0xFE> and <0xFE, 0xFF> were both considered
unlikely to occur in "normal" text. Of course, now they occur in lots
of "normal" Unicode text, but they are still doing their job. The
This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 10:46:30 CST