RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Dec 16 2004 - 05:05:47 CST

Next message: Lars Kristan: "RE: Roundtripping Solved"

Previous message: Arcane Jill: "RE: Roundtripping Solved"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Philippe VERDY: "Re: RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Marcin 'Qrczak' Kowalczyk wrote:
> Yes, IMHO all general-purpose languages should support processing
> arrays of bytes, in addition to Unicode strings.

C is likely to retain the behavior of the str functions. Although, it puts a
lot of burden on the developers to identify all opaque strings and really
handle them with those functions throughout the application (or even worse,
a suite of applications not neccessarily written by the same company).

Newer languages are probably often designed with an assumption that all you
need is a good class for Unicode strings. Instead of making them change that
assumption, we could consider finding a way to make that true.

If a solution that doesn't break anything in Unicode cannot be found, then
consider a solution that does break something, but check what the part that
is broken really affects. For example, we assume it MUST be possible to
represent a valid Unicode string in any UTF stream and get it back. Suppose
you find a solution that retains that capability for all Unicode codepoints
except for 128. If you know that those will ONLY be used for a particular
purpose, you might be willing to accept that those who use those codepoints
will deal with the problem and for those who don't the rules didn't really
change. What I am saying is that we need to preserve the intention of the
existing rules, not the rules themselves.

But again, this is if I was proposing that everybody starts using my
conversion everywhere. Which at this point I am not.

>
> It's not clear however how the API of filenames should look like,
> especially if they wish to be portable to Windows.

I intend to bring up the issue in near future. And try to let everyone catch
some breath before that.

> or delimit the filename with "\0", or prefix it with
> the length, or something like this.

I don't see why that would be necessary or useful.

> A backup software should do this
> and not pay attention to the locale. But for end-user software like
> an image viewer, processing arbitrary filenames is less important.

You have to pay attention to the locale eventually. You need to report which
file failed to be backed up (or is infected with a virus). And you should be
able to let the user restore a single file. If you don't interpret it
according to the locale (possibly UTF-8), user won't know how to select what
she wants. Even worse if one wants to enter the filename manually. All this
CAN be done within the application, but is very cumbersome. It gets worse if
you want to pass some information to another software, since the other
application may not have an interface to accept the opaque strings. If it
does, the convention may differ. This is why I am saying that something
should be standardized. Of course standardizing a poor solution is not a
good idea. We should do our best to find a good one.

> Technically they are binary (command line arguments must not contain
> zero bytes). Users are expecting stdin and stdout to be treated as
> text or binary depending on the program, while command like arguments
> are generally interpreted as text or filenames.

So, an application outputting filenames has a binary stdout and no text
application is guaranteed to process this output.

Lars

Next message: Lars Kristan: "RE: Roundtripping Solved"
Previous message: Arcane Jill: "RE: Roundtripping Solved"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Philippe VERDY: "Re: RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 16 2004 - 05:09:20 CST