Re: Unicode in source code. WHY?

From: G. Adam Stanislav (adam@whizkidtech.net)
Date: Wed Jul 21 1999 - 17:34:10 EDT


On Wed, Jul 21, 1999 at 02:06:58PM -0700, Kenneth Whistler wrote:
> The problem comes from canonical equivalences. If you do not take this
> into account, you could have two pieces of program text that from
> a user's point of view ought both to be valid, but one would match
> an identifier and compile correctly, while the other would not match
> an identifier and cause compilation errors. This sort of problem should
> not be shoved off to people with the suggestion that they look at and
> fix their program text with a hex code editor to find the differences
> in canonically equivalent sequences that otherwise appear the same.

I see. I would certainly not expect a compiler to have to deal with that.
Or a linker. Nor would I expect programmers to have to use a hex editor.

On the other hand, I would expect a text editor to give me the same
Unicode sequence when I type the same thing on my keyboard.

I can see the problem when using several different editors, though.

But it would not be too hard to write a text conversion program that
unifies such identifiers. The source could be run through such a program
before it is compiled. Or possibly, compilers could use it as a pre-processor.

The conversion program could even be made intelligent: Do not convert
anything between two double quotes, or anything between two single
quotes, and things like that (since different languages have
different methods of declaring string literals).

I think this is both doable and well worth the effort. Thanks
for the idea for my next project. :-)

Adam



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT