Re: UTC Agenda Item : UTF-8S

From: Markus Scherer (
Date: Tue May 15 2001 - 18:51:42 EDT


1. Binary order of UTF-16 strings compatible with binary order of UTF-8/32 is easily achieved using the "fix-up" described in my article on developerWorks (there is currently a problem with that site).

Essentially, one rotates the 16-bit values so that the surrogates get to the top of the range.

2. Binary order of correct UTF-8 strings compatible with binary order of UTF-16 is even more easily achieved. Here we do not need to rotate values but just shift any single byte with a value of 0xee..0xef to 0xfe..0xff. All other byte values stay the same.

3. I am not aware of any IBM software implementing UTF-EBCDIC. I do know that OS/390 is planning to use/using straight UTF-x instead.


