Re: UTF-8N?

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Jun 22 2000 - 06:30:50 EDT

Next message: Michael Kaplan (Trigeminal Inc.): "RE: Bengali: variants of same conjunct"
Previous message: Parvinder Singh(EHPT): "Chinese characters in Java Applet"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Peter_Constable@sil.org: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Cowan wrote:
>
> Now suppose we have a character sequence beginning with U+FEFF U+0020.
> This would be encoded as follows:
>
> US-ASCII: (not possible)
> UTF-16: 0xFE 0xFF 0xFE 0xFF 0x00 0x20 ...
> UTF-16: 0xFF 0xFE 0xFF 0xFE 0x20 0x00 ...
> UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
> UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
> UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
> UTF-8B: 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF 0x20 ...

There is something I should have missed.

It was my understanding that U+FEFF when received as first character should
be seen as BOM and not as a character, and handled accordingly.

So I expected:
  US-ASCII: 0x20
  UTF-16: 0xFE 0xFF 0x00 0x20 ...
  UTF-16: 0xFF 0xFE 0x20 0x00 ...
  UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
  UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
  UTF-8N: 0xEF 0xBB 0xBF 0x20 ...
  UTF-8B: 0xEF 0xBB 0xBF 0x20 ...

Antoine

Next message: Michael Kaplan (Trigeminal Inc.): "RE: Bengali: variants of same conjunct"
Previous message: Parvinder Singh(EHPT): "Chinese characters in Java Applet"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Peter_Constable@sil.org: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT