From: Yung-Fong Tang (email@example.com)
Date: Fri Feb 28 2003 - 13:21:01 EST
Kenneth Whistler wrote:
>Think of it this way. Does anyone expect the ASCII standard to tell,
>in detail, what a process should or should not do if it receives
>data which purports to be ASCII, but which contains an 0x80 byte
>in it? All the ASCII standard can really do is tell you that
>0x80 is not defined in ASCII, and a conformant process shall not
>interpret 0x80 as an ASCII character. Beyond that, it is up to
>the software engineers to figure out who goofed up in mislabelling
>or corrupting the data, and what the process receiving the bad data
>should do about it.
That is not a good comparision. ASCII is a single byte character code
standard. And when I got a 0x80 in ASCII string, I know where is the
boundary- the boundary is the whole 8-bits of that 0x80 is bad. The
scope is not the first 3 bits nor 9 bits- but the 8 bits data. I cannot
tell the rest of the data is good or bad, but I know ASCII is only
8-bits and 8 bits only.
Same thing for JIS x0208 (a TWO and only TWO bytes character set, not a
variable length character set). If I am processing a ISO-2022-JP message
and in the JIS x0208 mode and I got a 0x24 0xa8 I know the boundary of
that problem is 16 bits, not 8 -bits nor 32 bits.
When you deal with encoding which need states (ISO-2022, ISO-2022-JP,
etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the
situration is different.
This archive was generated by hypermail 2.1.5 : Fri Feb 28 2003 - 14:01:24 EST