RE: Can a single text document use multiple character encodings?

From: Michel Suignard <michel_at_suignard.com>
Date: Thu, 29 Aug 2013 06:57:43 +0000

For interested parties, ISO/IEC 10646:2012[1] clauses 11 and 12 contain long descriptions on how to identify the various UCS encoding scheme in ISO/IEC 2022, including necessary padding when using encoding schemes other than UTF-8. So it is in theory possible to mix UTF-8 and UTF-16 or even UTF32 schemes in a single 'plain text' file using the appropriate padding, the designation sequence, and the return to the coding system of ISO/IEC 2022 when transitioning between encoding schemes.
However I don't think that ISO/IEC 2022 is that popular in environment that convey Unicode/UCS data. Using it correctly is not trivial. It is interesting to note that that somewhat obscure part of the standard was revisited recently through ballot comments, due to translation issues in Japanese.

Michel
[1] available at http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
Received on Thu Aug 29 2013 - 02:01:00 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 29 2013 - 02:01:03 CDT