RE: New Charakter Proposal

From: Dominikus Scherkl (Dominikus.Scherkl@glueckkanja.com)
Date: Wed Oct 30 2002 - 10:46:36 EST

Next message: Alain LaBonté : "RE: Character identities"

Previous message: Marco Cimarosti: "RE: New Charakter Proposal"
Maybe in reply to: Dominikus Scherkl: "New Charakter Proposal"
Next in thread: Markus Scherer: "Re: New Charakter Proposal"
Reply: Markus Scherer: "Re: New Charakter Proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Cowan wrote:
> This sounds basically like an extension of U+303E IDEOGRAPHIC
> VARIATION INDICATOR (whose semantic is: "The following character
> is not what I want, but it's the best approximation I can get")
> to non-ideographs.
>
> I have no problem with this idea.

So you mean: use U+303E + 'ä' to indicate that you would prefer
the old-german form of that character if the font contains it?

But that's no solution, because then you could directly use
"a with superscript e above".

What I thought of was a mechanism to display a character not
found in the font by another char toghether with an indicator
that shows the reader that it's not the real char but a replacement.
But that's font-technology and therefore off topic.
Please forget about that.

My other suggestion (and the main reason to call the proposed
charakter "source failure indicator symbol" (SFIS)) was intended
especaly for mall-formed utf-8 input that has overlong encodings.

In this special case a converter exactly knows which char is
intended, but needs to put out an error to avoid ambiguities.
In this case by now it MUST replace the overlong char by U+FFFD
(or even cancel the conversion!).
But I think SFIS + intended-char is a far better approach,
because it
1) warns the reader AND keeps the text readable
2) distinguish overlong encodings from illegal char sequenzes.

Especialy the second is of security interest, because
overlong sequences are unlikely to occure unless introduced
intentional (an old and buggy encoder or an attack) while
illegal sequences are almost erroneous (cut stream, bit error
or no utf-8 at all).

For other source charsets this might be also useful but may
cause problems - I have not realy thought over this in detail.
But I think there are charsets which differ from others
only in that they left several codepoints undefined while newer
versions define them (eg. the euro-symbol).
If there is a high probability that a specific character is
intended, the SFIS mechanism is advantageous, I think.

Best Regards.

-- 
Dominikus Scherkl
dominikus.scherkl@glueckkanja.com

Next message: Alain LaBonté : "RE: Character identities"
Previous message: Marco Cimarosti: "RE: New Charakter Proposal"
Maybe in reply to: Dominikus Scherkl: "New Charakter Proposal"
Next in thread: Markus Scherer: "Re: New Charakter Proposal"
Reply: Markus Scherer: "Re: New Charakter Proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 30 2002 - 11:26:44 EST