Is there any work going on to review the POSIX.1 and POSIX.2 standards
systematically to add proper UTF-8 support?
I don't think much has to be done, but there are a few crucial bits. For
instance, the terminal driver can be set into a "cooked" mode where a
single-line editing mechanism is applied before sending a line to an
application, and the implementation of the erase function there has to
know how many bytes to remove when a character is erased, which makes a
difference between UTF-8 and ISO 8859-1 for instance. There should be a
standard way to tell the terminal that it is in UTF-8 mode and has to
perform character erase actions accordingly.
Also the syntax for the entire locale database mechanisms was really
designed for small 8-bit character sets and becomes rather horrible when
applied to UTF-8. I get the impression that wchar_t <-> UTF-8 conversion
is supposed to be done by table lookup of UTF-8 byte sequences as
opposed to the obvious conversion algorithm. UTF-8 would certainly
deserve some special treatment here as a recognized encoding in the
locale system.
Anyone knowing on the current status of UTF-8 and POSIX?
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT