Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Dan Oscarsson (Dan.Oscarsson@trab.se)
Date: Fri Aug 21 1998 - 10:27:50 EDT


Yung-Fong Tang wrote:

>Request a UTF to compatable with ISO-8859-1 is like reqest the JPEG working group
>to have a JPEG to compatable with GIF, or ask a VCD compatable with the LaserDisk
>format. In the case of VCD and LaserDisk, instead of making the VCD compatable
>with the LaserDisk, vender (may) make *VCD/LaserDisk player* (but not the disk-
>the data) could play both disks. It is job for the player designer to solve the
>compatability issue instead of the job for the disk designer. Same thing apply
>here.

JPEG is not related to GIF (except being image formats), but ASCII and ISO 8859-1
are true subsets of UCS. And UTF-8 is a way to encode UCS.

>
>One of the reason you request this is because in your head, there are only one
>important charset to you - ISO-8859-1. However, for my company, we care many
>charset- ISO-8859-1, ISO-8859-2, ISO-8859-5., ISO-8859-7, ISO-8859-9, KOI8-R,
>Shift_JIS, Big5, GB2312, ECU-KR, etc. If your request is reasonable, then I would
>like to ask someone to design a UTF compatable with Big5 and GB2312, and
>Shift_JIS, and KOI8_R. (just joking.)

No, the only important character set for me is UCS. And currently I use only the
first 256 codes of UCS as they are all I need, for the moment. Those codes happen
to be the same as ISO 8859-1.
To be able to allow other code values from UCS than the first 256, I need a way
to add those without making all software I have to day obsolete and the new
software must be able to read all existing texts.
UTF-8 will not work unless it can read and write files compatible with what
I have today.
You who use non-latin character will also need something to mix old and new,
but your character sets
are not true subsets of UCS and cannot be handled as easily as ISO 8859-1.

>
>> Data storage that needs special software to access the storage device
>> (like databases) can have any encoding they like internally, it is always
>> accessed through the special software. A normal file can be accessed and
>> written by many tools and must then be in a standard format that most programs
>> can handle.
>
>And such *STANDARD FORMAT* cannot be ISO-8859-1. Why, because ISO-8859-1 cannot
>encode Japanese, Korean, Chinese, and even Eastern European languages. That is THE
>REASON why people proposed to have UTF-8. UTF-8 may not be the BEST choice we
>could have, but ISO 8859-1 definitely is worst than it.
>
I doubt UTF-8 is the right choice for Chinese, UCS-2 would be better. And for transport
between places, UTF-8 would be fine.
But most tools I have on my computer can only read 8-bit bytes and my files are in
ISO 8859-1. As UTF-8 is not compatible with current usage on my system and I cannot
expect software venders to fix my software any time soon, and new software using
UTF-8 cannot read my old files, UTF-8 has not usage om my system.

   Dan

--
Dan Oscarsson
Telia Prosoft AB                       Email: Dan.Oscarsson@trab.se
Box 85
201 20  Malmo, Sweden



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT