UTF-8 endianness

From: Elliotte Rusty Harold (elharo@metalab.unc.edu)
Date: Tue May 18 1999 - 10:49:15 EDT

Next message: Guedon, Stephane: "FW: Regular Expressions"
Previous message: mark.davis@us.ibm.com: "Unicode FAQ"
Next in thread: schererm@us.ibm.com: "Re: UTF-8 endianness"
Maybe reply: schererm@us.ibm.com: "Re: UTF-8 endianness"
Maybe reply: G. Adam Stanislav: "Re: UTF-8 endianness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

It's my reading of the UTF-8 spec in Appendix A of the Unicode
specification, that UTF-8 is defined as a sequence of bytes in a particular
order. A C program that writes correct UTF-8 data produces the same output
on big and little endian architectures, for example. Is this accurate? Or
can the bytes get swapped anywhere on different platforms?

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Java I/O (O'Reilly & Associates, 1999) |
| http://metalab.unc.edu/javafaq/books/javaio/ |
| http://www.amazon.com/exec/obidos/ISBN=1565924851/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java news: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML news: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+

Next message: Guedon, Stephane: "FW: Regular Expressions"
Previous message: mark.davis@us.ibm.com: "Unicode FAQ"
Next in thread: schererm@us.ibm.com: "Re: UTF-8 endianness"
Maybe reply: schererm@us.ibm.com: "Re: UTF-8 endianness"
Maybe reply: G. Adam Stanislav: "Re: UTF-8 endianness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT