Re: UTF-8N?

From: Mark Davis (markdavis@ispchannel.com)
Date: Tue Jun 20 2000 - 11:09:09 EDT


I want to make sure that people are not mislead by that paper. There is a note below that section that:

"Note: The italicized names are not yet registered, but are useful for reference."

and "UTF-8N" is italicized. It is not a registered name, and should not be used outside of a closed system.

The reason I make that notational distinction in the text is that there is a danger with UTF-8 currently: BOM can be used with it, and some people do. Since, unlike the case of UTF-16 / UTF-16BE / UTF-16LE, there is no way to distinguish between implementations that allow a BOM and those that don't, the situation is slightly unstable: if you find EF BB BF at the start of a UTF-8 file, you don't know whether to delete it or not.

In XML, this situation does not arise, since it specifies the exact useage of BOM, but it can arise in other circumstances.

Mark

Masahiko Maedera wrote:

> I found UTF-8N in the following URL.
>
> www-4.ibm.com/software/developer/library/utfencodingforms/index.html
>
> I have understood the meaning and the format of UTF-8N.
> But I don't make sure how it will be treated in future.
>
> Does anyone have plan to regist new charset UTF-8N,
> or any other information about it?
>
> Thank you in advance.
>
> --
> Masahiko Maedera.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT