Re: Swift from Jeff Senn on 2014-06-05 (Unicode Mail List Archive)

From: Jeff Senn <senn_at_maya.com>
Date: Thu, 5 Jun 2014 14:47:28 -0400

On Jun 5, 2014, at 2:22 PM, J. Leslie Turriff <jlturriff_at_centurylink.net> wrote:

> On Thursday 05 June 2014 12:24:12 Jeff Senn wrote:
>> On Jun 5, 2014, at 12:41 PM, Hans Aberg <haberg-1_at_telia.com> wrote:
>>> On 5 Jun 2014, at 17:46, Jeff Senn <senn_at_maya.com> wrote:
>>>> That is: are identifiers merely sequences of characters or intended to
>>>> be comparable as “Unicode strings” (under some sort of compatibility
>>>> rule)?
>>>
>>> In computer languages, identifiers are normally compared only for
>>> equality, as it reduces lookup time complexity.
>>
>> Well in this case we are talking about parsing a source file and generating
>> internal symbols, so the complexity of the comparison operation is a red
>> herring.
>>
>> The real question is how does the source identifier get mapped into a
>> (compiled) symbol. (e.g. in C++ this is not an obvious operation)
>>
>> If your implication is that there should be no canonicalization (the string
>> from the source is used as a sequence of characters only directly mapped to
>> a symbol), then I predict sticky problems in the future. The most obvious
>> of which is that in some cases I will be able to change the semantics of
>> the complied program by (accidentally) canonicalizing the source text (an
>> operation, I will point out, that is invisible to the user in many (most?)
>> Unicode aware editors).
> So if programmer A uses editor X to write code, and programmer B uses editor
> Y to modify the code, suddenly the compiler might start generating multiple
> symbols for some identifiers, causing compiles to fail for no obvious reason.
> It seems to me that "the complexity of the comparison operation is a red
> herring" is perhaps a naive view; this would produce a really high
> astonishment factor.
>
> Leslie

I think we are agreeing (and miscommunicating) — the comparison operator
ON SYMBOLS is incredibly important. Of course symbols must be unique!

Comparing sequences of characters in the SOURCE for equality is almost a non-issue.
(Consider macros, case-insensitivty in some languages, context in languages such as C++, etc…)

You illustrate the problem in your example. If I write (4 characters of source) code (because my editor uses
decomposed characters):

å=1
’a’ '<combing dot above>' ‘=' ‘1’

And you look at it and think you are going to write code to access that value
(and your editor uses composed characters - so you have 3 characters):

å = 2
‘<a-with-dot-above>' ‘=' ‘2’

Then we have astonishment.

>
> --
> "Disobedience is the true foundation of liberty. The obedient must be
> slaves." --Henry David Thoreau
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Jun 05 2014 - 13:48:29 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 05 2014 - 13:48:30 CDT