Re: UTF-8N?

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Jun 22 2000 - 06:30:50 EDT


John Cowan wrote:
>
> Now suppose we have a character sequence beginning with U+FEFF U+0020.
> This would be encoded as follows:
>
> US-ASCII: (not possible)
> UTF-16: 0xFE 0xFF 0xFE 0xFF 0x00 0x20 ...
> UTF-16: 0xFF 0xFE 0xFF 0xFE 0x20 0x00 ...
> UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
> UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
> UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
> UTF-8B: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF 0x20 ...

There is something I should have missed.

It was my understanding that U+FEFF when received as first character should
be seen as BOM and not as a character, and handled accordingly.

So I expected:
  US-ASCII: 0x20
  UTF-16: 0xFE 0xFF 0x00 0x20 ...
  UTF-16: 0xFF 0xFE 0x20 0x00 ...
  UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
  UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
  UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
  UTF-8B: 0xEF 0xBB 0xBF 0x20 ...

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT