RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Wed Dec 15 2004 - 08:04:02 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping in Unicode"

    Ooooops, correction:

    In response to Marcin 'Qrczak' Kowalczyk
    >> Question: should a new programming language which uses Unicode for
    >> string representation allow non-characters in strings? Argument for
    >> allowing them: otherwise they are completely useless at all, except
    >> U+FFFE for BOM detection. Argument for disallowing them: they make
    >> UTF-n inappropriate for serialization of arbitrary strings, and thus
    >> non-standard extensions of UTF-n must be used for serialization.

    I wrote:

    My opinion:
    > It should allow them and process them usefully. Furthermore, this
    > 'usefully' should not be up to developers to discover. It should be
    > researched, described, well, in the end even standardized. IMHO, UTC
    > should consider leading this process, even if it does not end with
    > anything standardized in Unicode standard.
    >
    > Validation should be completely separated from processing. IMHO.

    I wasn't paying attention to what Marcin wrote, namely the term
    "non-characters".
    What I wrote goes for invalid sequences and surrogates.

    Lars



    This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 08:11:33 CST