> Maybe one should make a transmission safe UTF that left C1 alone?
Remember this? --
From: Markus Scherer <email@example.com>
To: "Unicode List" <firstname.lastname@example.org>
Date: Mon, 10 Apr 2000 15:23:53 -0800 (GMT-0800)
Subject: What if UTF-8 had been defined after UTF-16?
What if UTF-8 had been defined just for the code point range 0..0x10ffff?
What if UTF-8 had been designed to be not just "File-System-Safe" but also
UTF-8 could have had all the nice features that it has now, plus:
- C1 control codes (0x80..0x9f) passed through as single bytes
- no sequences longer than 4 bytes, BMP still covered with 3 bytes
- no checking for code points > 0x10ffff because
it could have been designed just for that range
- no minimum-length problem -> no security concerns
- all byte values used for some encoding
It would have been possible. Interested? See
Note: This is _not_ an approved UTF. I am _not_ proposing this as a new
UTF. This is _not_ compatible with any existing UTF or other Unicode
implementation. It is just a play with bits and bytes, a "what if", a
Just to share a thought -
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT