Re: UTF-8 and POSIX

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Wed Jun 23 1999 - 13:40:29 EDT


On Wed, Jun 23, 1999 at 07:37:15AM -0700, Markus Kuhn wrote:
> Is there any work going on to review the POSIX.1 and POSIX.2 standards
> systematically to add proper UTF-8 support?
>
> I don't think much has to be done, but there are a few crucial bits. For
> instance, the terminal driver can be set into a "cooked" mode where a
> single-line editing mechanism is applied before sending a line to an
> application, and the implementation of the erase function there has to
> know how many bytes to remove when a character is erased, which makes a
> difference between UTF-8 and ISO 8859-1 for instance. There should be a
> standard way to tell the terminal that it is in UTF-8 mode and has to
> perform character erase actions accordingly.

Hmm, why should UTF-8 support differ here from say EUC support?
The support should be there already.

> Also the syntax for the entire locale database mechanisms was really
> designed for small 8-bit character sets and becomes rather horrible when
> applied to UTF-8. I get the impression that wchar_t <-> UTF-8 conversion
> is supposed to be done by table lookup of UTF-8 byte sequences as
> opposed to the obvious conversion algorithm. UTF-8 would certainly
> deserve some special treatment here as a recognized encoding in the
> locale system.

We have in WG20 enhanced the locale syntax to be able to cater for
ISO 10646 in the forthcoming ISO/IEC 14652 TR.

UTF-8 does not need to be implemented as a charmap, it could be
implemented as something special.

> Anyone knowing on the current status of UTF-8 and POSIX?

I wrote a paper on 10646 support for WG15, which is now
included in the current draft of TR 14766. It base idea was using UTF-8
as a standard in all POSIX standards.

-- 
Keld Jørn Simonsen, keld@dkuug.dk        
DKUUG, Fruebjergvej 3, DK-2100 København Ø, Danmark



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT