Date: Wed Oct 30 2002 - 18:13:53 EST

    Dominikus Scherkl wrote:
    > Converting from and to utf-8 is an all-day topic, very important
    > for all applications handling with unicode. So it is a special

    Converting text to/from UTF-8 is indeed common and important.

    Converting text that claims to be UTF-8 - but isn't - is different: It may be a spoofing attempt, or
    bytes may have been lost, or the text may not be UTF-8 at all, etc. How to handle non-UTF-8 text in
    a from-UTF-8 converter seems to be a judgement call, and application-specific.
    (How does the converter know _why_ there is an illegal sequence?)

    > Additional I think we should have a standardized way to display
    > old utf-8 text without losing information (overlong utf-8 was
    > allowed for years) ...

    ISO 10646 and the RFC never allowed to generate overlong UTF-8. Unicode at least used to say "should
    not" for generation (but allowed decoding). Chances are nearly 100% that overlong UTF-8 was a
    spoofing attempt, or the result of something other than a UTF-8 encoder.

