Re: Conformance

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Fri Jan 21 2005 - 19:17:14 CST

  • Next message: Lars Kristan: "RE: Subject: Re: 32'nd bit & UTF-8"

    "Arcane Jill" <arcanejill@ramonsky.com> writes:

    > D41: UTF-16LE encoding scheme: The Unicode encoding scheme that serializes a
    > UTF-16
    > code unit sequence as a byte sequence in little-endian format.
    > * In UTF-16LE, the UTF-16 code unit sequence <004D 0430 4E8C D800 DF02> is
    > serialized as <4D 00 30 04 8C 4E 00 D8 02 DF>.
    > * In UTF-16LE, an initial byte sequence <FF FE> is interpreted as U+FEFF ZERO
    > WIDTH NO-BREAK SPACE.

    (Below I talk about encoding schemes.)

    In UTF-16LE and UTF-16BE there is no BOM, while in UTF-16 an optional
    initial FEFF is a BOM.

    Why there is only one kind of UTF-8 then? It would be fair if it had
    variants like UTF-16 and UTF-32: a variant which doesn't include
    special BOM handling, analogous to UTF-16LE and UTF-16BE (obviously
    only one flavor is needed, because byte order issues don't apply)
    and a variant which does, analogous to UTF-16.

    -- 
       __("<         Marcin Kowalczyk
       \__/       qrczak@knm.org.pl
        ^^     http://qrnik.knm.org.pl/~qrczak/
    


    This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 19:18:21 CST