From: Mark Davis (firstname.lastname@example.org)
Date: Wed Oct 30 2002 - 09:46:21 EST
We had thought of something similar, but which would provide more
information in interfaces.
Reserve a space of 256 code points, with names:
During a conversion process, if some bytes (say from corrupt UTF-8) cannot
be correctly converted into code points, then a sequence of the above are
generated. This doesn't preserve the original text -- you would never
convert back from these codepoints to anything; it is really only useful
ephemerally, in the process of doing a conversion where something goes
wrong. It is really only a slightly more verbose FFFD REPLACEMENT, but would
be handy in certain conversion APIs, expecially in
single-code-point-at-a-time API like getChar().
► “Eppur si muove” ◄
----- Original Message -----
From: "Dominikus Scherkl" <Dominikus.Scherkl@glueckkanja.com>
Sent: Wednesday, October 30, 2002 03:49
Subject: New Charakter Proposal
> I would like to have a "source failure indicator symbol" (SFIS)
> charakter in the unicode, which a charset-convertion unit may
> insert into a text (Suggeested position: U+FFF8).
> several charsets have undefined codepoints which were
> defined in a former or later version (eg. overlong
> UTF-8 encodings or the $ symbol (0x24) in the INVARIANT
> A converter can replace such symbols by U+FFFD (which is
> correct but loses the information), or simply use the
> charakter which most likely is intended (which hides the error).
> Both is not very good.
> The SFIS would allow the reader to see that an error occured
> and therefore the following charakter may be incorrect, but
> maintain the readability if the right conversion is made anyway
> (or at least give a hint which charakter may be intended -
> eg. the $ sign could have been any other currency symbol
> if a national 7-bit charset was changed to INVARIANT by
> previous conversions).
> Of course a converter can still use U+FFFD if it has no
> idea which character is intended or if unicode doesn't contain
> the character.
> The whole "charakter identities"-discussion gave me another
> reason to introduce such a SFIS-charakter:
> A font-renderer may show the SFIS before a charakter which
> is replaced by another one because the correct one is not
> contained in the font (eg. it may render an "a with
> superscript e above" by SFIS + "a umlaut" to indcate the
> error and show an probably fitting replacement, which is
> much better than to show an empty square).
> In short words:
> The SFIS may indicate a kind of compatibility-decomposition
> of the following charakter.
> (this is not nessessarily the standard compatibility-decomposition).
> I'd like to hear if my suggestion is completely weird or
> if anybody else think it might be useful.
> Best Regards.
> Dominikus Scherkl
This archive was generated by hypermail 2.1.5 : Wed Oct 30 2002 - 10:23:32 EST