I recently discovered Unicode and I must say that it is great! I found
out that the lower 8 bits of the Unicode are backwards compatible to
ISO 8859-1 (Latin-1). Thus, if the high byte is zero, we would not
really have to transmit it in messages. UTF-8 and UTF-7 does the trick
for the old 7 bit ASCII set but requires me to render Latin-1 codes
that have the high bit set unreadable by non-Unicode aware presentation
programs. Also UTF-8 and UTF-7 require me to change all my ISO Latin-1
texts to UTF. This is not satisfactory for a European who has produced
lots of text in Latin-1 and who depends on Latin-1 aware but UTF-7/8
unaware software. I wonder if there is no encoding like UTF-7 that would
allow all lower eight bit to be set.
I see this isn't possible with UTF-8, because the presence of the high
bit encodes the escape to the multi-byte character code. But in UTF-7
this would have been perfectly possible, because we use a full escape
character rather than the high bit.
I would like to know (1) if others feel the same concerns that there
is one UTF missing, (2) if there are proposals out already, and (3)
if such a proposal (much like UTF-7) would have a chance to be accepted
by whoever is in charge of the UTF series (Unicode org? ISO?).
PS: how can I subscribe to the unicode mailing list?
Gunther Schadow ----------------------------------- http://aurora.rg.iupui.edu
Regenstrief Institute for Health Care
1001 W 10th Street RG5, Indianapolis IN 46202, Phone: (317) 630 7960
firstname.lastname@example.org ---------------------- #include <usual/disclaimer>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT