RE: latin1 decoder implementation from Whistler, Ken on 2012-11-16 (Unicode Mail List Archive)

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Fri, 16 Nov 2012 22:59:54 +0000

No Unicode doesn't. But yes, is *does* follow that decoding C0/C1 control codes produces a Unicode code point of equal value. RTFM. TUS 6.2, p. 544:

"There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes defined in the ISO/IEC 2022 framework. ... The Unicode Standard provides for the intact interchange of these code points, neither adding to nor subtracting from their semantics. The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992."

--Ken

latin1 explicitly doesn't define characters (or control codes) in those ranges, but unicode does.
It doesn't directly follow that decoding a byte in those undefined ranges produces a unicode-point of equal value.
Received on Fri Nov 16 2012 - 17:00:46 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 16 2012 - 17:00:46 CST