RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Sat Dec 11 2004 - 11:32:19 CST

  • Next message: Lars Kristan: "RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)"

    Marcin 'Qrczak' Kowalczyk wrote:

    > > Roundtrip for valid data is of course essential and needs to be
    > > preserved.
    >
    > Your proposal does not do this.
    All assigned codepoints do roundtrip even in my concept. But unassigned
    codepoints are not valid data. Perhaps it should be stated using some other
    words, but there shouldn't be any in your data. Right?

    Furthermore, I was proposing this concept to be used, but not
    unconditionally. So, you can, possibly even should, keep using whatever you
    are using.

    >
    > > If a user encounters corrupt data and cannot process it with your
    > > program, she ("she" is 'politically correct', but in this case can
    > > be seen as sexism) will blame it on the program, not the data.
    >
    > I don't care.
    If you don't, then the guy trying to sell your program will. Eventually, you
    will, too.

    >
    > > This has been discussed mails back. UNIX filenames are
    > already 'submitted'.
    > > Once you set your locale to UTF-8, you have labelled them
    > all as UTF-8.
    > > Suggestions?
    >
    > Convert them to be valid UTF-8 (as long as locales used in the system
    > use UTF-8 as the encoding, that is, otherwise keep them in
    > the locale's
    > encoding).
    Perhaps I can convert mine, but I cannot convert all filenames on a user's
    system. Other suggestions?

    Lars



    This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 11:37:52 CST