Re: Conformance

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Fri Jan 21 2005 - 19:17:14 CST

Next message: Lars Kristan: "RE: Subject: Re: 32'nd bit & UTF-8"

Previous message: Gregg Reynolds: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Arcane Jill: "Conformance (was UTF, BOM, etc)"
Next in thread: Lars Kristan: "RE: Conformance (was UTF, BOM, etc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Arcane Jill" <arcanejill@ramonsky.com> writes:

> D41: UTF-16LE encoding scheme: The Unicode encoding scheme that serializes a
> UTF-16
> code unit sequence as a byte sequence in little-endian format.
> * In UTF-16LE, the UTF-16 code unit sequence <004D 0430 4E8C D800 DF02> is
> serialized as <4D 00 30 04 8C 4E 00 D8 02 DF>.
> * In UTF-16LE, an initial byte sequence <FF FE> is interpreted as U+FEFF ZERO
> WIDTH NO-BREAK SPACE.

(Below I talk about encoding schemes.)

In UTF-16LE and UTF-16BE there is no BOM, while in UTF-16 an optional
initial FEFF is a BOM.

Why there is only one kind of UTF-8 then? It would be fair if it had
variants like UTF-16 and UTF-32: a variant which doesn't include
special BOM handling, analogous to UTF-16LE and UTF-16BE (obviously
only one flavor is needed, because byte order issues don't apply)
and a variant which does, analogous to UTF-16.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Next message: Lars Kristan: "RE: Subject: Re: 32'nd bit & UTF-8"
Previous message: Gregg Reynolds: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Arcane Jill: "Conformance (was UTF, BOM, etc)"
Next in thread: Lars Kristan: "RE: Conformance (was UTF, BOM, etc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 19:18:21 CST