Re: Unicode in source code. WHY?

From: Geoffrey Waigh (anzu@home.com)
Date: Wed Jul 21 1999 - 18:49:25 EDT


"G. Adam Stanislav" wrote:
>
> On Wed, Jul 21, 1999 at 02:06:58PM -0700, Kenneth Whistler wrote:
> > The problem comes from canonical equivalences. If you do not take this
> > into account, you could have two pieces of program text that from
> > a user's point of view ought both to be valid, but one would match
> > an identifier and compile correctly, while the other would not match
> > an identifier and cause compilation errors.
>
> I see. I would certainly not expect a compiler to have to deal with that.
> Or a linker. Nor would I expect programmers to have to use a hex editor.
>
> On the other hand, I would expect a text editor to give me the same
> Unicode sequence when I type the same thing on my keyboard.
>
> I can see the problem when using several different editors, though.
>
> But it would not be too hard to write a text conversion program that
> unifies such identifiers. The source could be run through such a program
> before it is compiled. Or possibly, compilers could use it as a pre-processor.

People seem to be missing an element of the software developement
cycle here. Given that one purpose of Unicode is to allow data
gathered from around the globe to be processed and presented to
people around the globe (linguistically, not network wise;) it
doesn't seem farfetched that people would use code written from
around the world. A number of organizations work with development
groups of more than 1 person. They have changing staffs over time.
They are even known to link against a variety of different third-
party libraries that were built with different tool-chains.

It may not be the place of the Unicode Consortium to give guidance
on this, but if the appropriate standards bodies are not encouraged
to follow Unicode string comparison semantics (ie canonical equivalence,)
when using Unicode data - whether it is inside compilers, linkers or
elsewhere - composable character sequences will still be ghettoized
by lazy implementations.

Geoffrey, living in a warren with a U+030A next door.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT