From: Dominikus Scherkl (Dominikus.Scherkl@glueckkanja.com)
Date: Wed Oct 30 2002 - 13:48:30 EST
Markus Scherer wrote:
> Dominikus Scherkl wrote:
> > My other suggestion (and the main reason to call the proposed
> > charakter "source failure indicator symbol" (SFIS)) was intended
> > especaly for mall-formed utf-8 input that has overlong encodings.
> This is a special, custom form of error handling - why assign
> a character for it?
Converting from and to utf-8 is an all-day topic, very important
for all applications handling with unicode. So it is a special
case, but very common.
Therefore it would be nice to have a standardized - application
independend - error handling for it. Also it is a mechanism
useful for many other charsets beeing converted do unicode.
> You could just use an existing character or non-character for
> this, e.g., U+303E or U+FFFF or U+FDEF or similar.
This is what I do meanwhile. But it's uncomfortable, because
most editors display all non-characters, unassigned characters
or charakters not in the font all the same way - which hides
the INDICATION. The SFIS should be displayed to remind the reader
only THIS is a SFIS unlike all the other empty suqares in the
Additional I think we should have a standardized way to display
old utf-8 text without losing information (overlong utf-8 was
allowed for years) - gyphing is not a fine way and simply
decoding the overlong forms is not allowed. This is a self-made
problem, so unicode should provide an inherent way to solve it.
-- Dominikus Scherkl email@example.com
This archive was generated by hypermail 2.1.5 : Wed Oct 30 2002 - 14:36:14 EST