UTF-8 endianness

From: Elliotte Rusty Harold (elharo@metalab.unc.edu)
Date: Tue May 18 1999 - 10:49:15 EDT


It's my reading of the UTF-8 spec in Appendix A of the Unicode
specification, that UTF-8 is defined as a sequence of bytes in a particular
order. A C program that writes correct UTF-8 data produces the same output
on big and little endian architectures, for example. Is this accurate? Or
can the bytes get swapped anywhere on different platforms?

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Java I/O (O'Reilly & Associates, 1999) |
| http://metalab.unc.edu/javafaq/books/javaio/ |
| http://www.amazon.com/exec/obidos/ISBN=1565924851/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java news: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML news: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT