Strange UTF-8 in Java

From: Elliotte Rusty Harold (elharo@sunsite.unc.edu)
Date: Sun Sep 27 1998 - 10:25:56 EDT

Next message: John Cowan: "Re: Strange UTF-8 in Java"
Previous message: Sairus P. Patel: "Re: S with comma/cedilla"
Next in thread: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Mark Davis: "Re: Strange UTF-8 in Java"
Maybe reply: Rick McGowan: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Doug Ewell: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Mark Davis: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Rick McGowan: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: David Goldsmith: "Re: Strange UTF-8 in Java"
Maybe reply: Elliotte Rusty Harold: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

As you may or may not know, Java's UTF-8 encodes the null charactor, ASCII
0, in two bytes rather than one as it should according to the UTF-8
specification. The standard two-byte decoding algorithm should handle this
case anyway. Nonetheless I'm wary since it does violato the "Be
conservative in what you write, be liberal in what you read" principle. So
my question is three fold:

1. Will using Java's UTF-8 format produce problems for any software
anyone's aware of?

2. In general, is it always acceptable to encode a one-byte character in
two or three bytes? or a two-byte character in three bytes?

3. Does anyone know why Java does not want to encode the 0 character as a
single byte? In other words, is there any reason why a stream should not
contain embedded nulls?

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@sunsite.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| XML: Extensible Markup Language (IDG Books 1998) |
| http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java news: http://sunsite.unc.edu/javafaq/ |
| Read Cafe con Leche for XML news: http://sunsite.unc.edu/xml/ |
+----------------------------------+---------------------------------+

Next message: John Cowan: "Re: Strange UTF-8 in Java"
Previous message: Sairus P. Patel: "Re: S with comma/cedilla"
Next in thread: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Mark Davis: "Re: Strange UTF-8 in Java"
Maybe reply: Rick McGowan: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Doug Ewell: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Mark Davis: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: Rick McGowan: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Maybe reply: David Goldsmith: "Re: Strange UTF-8 in Java"
Maybe reply: Elliotte Rusty Harold: "Re: Strange UTF-8 in Java"
Maybe reply: John Cowan: "Re: Strange UTF-8 in Java"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT