Re: Does "endian-ness" apply to UTF-8 characters that use multiple bytes?

From: James Tauber via Unicode <unicode_at_unicode.org>
Date: Mon, 4 Feb 2019 15:27:00 -0500

Endian-ness only affects ordering of bytes within a code unit.

Because UTF-8 has single byte code units, the order is not affected by
endian-ness, only the UTF-8 bit mapping itself.

Note also that endian-ness only affects individual 16-bit code units in
UTF-16. If you have a surrogate pair, endian-ness doesn't effect the
ordering of each 16-bit unit that makes up the pair, only the two bytes
within each of the units.

James

On Mon, Feb 4, 2019 at 2:25 PM Costello, Roger L. via Unicode <
unicode_at_unicode.org> wrote:

> Hello Unicode Experts!
>
> As I understand it, endian-ness applies to multi-byte words.
>
> Endian-ness does not apply to ASCII characters because each character is a
> single byte.
>
> Endian-ness does apply to UTF-16BE (Big-Endian), UTF-16LE (Little-Endian),
> UTF-32BE and UTF32-LE because each character uses multiple bytes.
>
> Clearly endian-ness does not apply to single-byte UTF-8 characters. But
> what about UTF-8 characters that use multiple bytes, such as the character
> é, which uses two bytes C3 and A9; does endian-ness apply? For example, if
> a file is in Little Endian would the character é appear in a hex editor as
> A9 C3 whereas if the file is in Big Endian the character é would appear in
> a hex editor as C3 A9?
>
> /Roger
>
>

-- 
*James Tauber*
Eldarion <https://eldarion.com/> | jktauber.com (Greek Linguistics)
<https://jktauber.com/> | Modelling Music
<https://modelling-music.com/> | Digital
Tolkien <https://digitaltolkien.com/>
Received on Mon Feb 04 2019 - 14:27:31 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 04 2019 - 14:27:31 CST