John Cowan wrote:
>
> Now suppose we have a character sequence beginning with U+FEFF U+0020.
> This would be encoded as follows:
>
> US-ASCII: (not possible)
> UTF-16: 0xFE 0xFF 0xFE 0xFF 0x00 0x20 ...
> UTF-16: 0xFF 0xFE 0xFF 0xFE 0x20 0x00 ...
> UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
> UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
> UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
> UTF-8B: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF 0x20 ...
There is something I should have missed.
It was my understanding that U+FEFF when received as first character should
be seen as BOM and not as a character, and handled accordingly.
So I expected:
US-ASCII: 0x20
UTF-16: 0xFE 0xFF 0x00 0x20 ...
UTF-16: 0xFF 0xFE 0x20 0x00 ...
UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
UTF-8B: 0xEF 0xBB 0xBF 0x20 ...
Antoine
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT