Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Dec 11 2003 - 13:16:15 EST

Next message: Benjamin Peterson: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

Previous message: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Mark, don't patronise me. I'm not talking about levels of enlightenment.
> I'm not talking about levels in the sense you just used when you
> mentioned "higher-level issues". I'm talking about the well-known
> concept of levels or layers of programming and of communication protocols.

My apologies; I had intended a lighter tone, not patronization.

> Here I disagree. As an application programmer writing for example some
> kind of linguistic application, it is totally irrelevant to me how much
> actual storage a string takes. Such things should be hidden away from me
> by several levels of system software and compilers. An application
> programmer doesn't even need to know what this concept means! Seriously!
> Beginners, even young children, can be taught simple programming and
> string handling without knowing anything about bits and bytes, certainly
> without having to know whether the e acute they just typed is stored as
> one byte or two. Just as people can and do learn to drive cars without
> knowing anything about the nuts and bolts or how the engine works.

A nice dream, but doesn't really match anything I know about. Programmers will
always need to know storage counts in strings, at least in intermediate
processing. In C, of course, it is crucial. Even in a language with String
objects, like Java, even just getting the last bit of a string uses a length.

a.substring(pos,a.length()-1)

The indexing within strings is always using storage units, for good reason. Take
a typical operation: I do a match on a string, and find out that the position of
what I was searching for was <9,15> (in code units). I then do some other
operations using that data, e.g. extracting a substring, or replacing the
contents. Those all reference the indexes that I determined earlier. All of
these processes are much faster if the indexing is always done in code units.

You are right that higher-level tools make it less necessary to get into some of
the guts here. Rather than have to deal with indexes, I can use a split function
to produce an array of strings, or a regex function to search and replace. But I
don't see how you can always get away from the need to index.

One could, of course, design a programming language that always indexed and
counted by some other entity, say, default grapheme clusters. Such a language
would be be unable to deal with pieces that didn't constitute a complete
cluster, or and have to deal with the issues such as that the number of entities
in the concatenation of two strings is not the same as the same as the sum of
number of numbers of entities in each of the strings, so indexing gets pretty
tricky. I don't know of any programming language that has tried to do this, and
I don't think it would be of particular value -- and would be exceedlingly slow.

To take your analogy of the car, programmers are really much more like the
mechanics than the drivers. A casual driver doesn't really need to know
anything, although will still need to know some measurement of gas. (Maybe that
isn't true of SUV owners -- they'd really rather not know the cumulative effects
on their pocketbook, the environment, or international politics.) But the
mechanics still have to know how to measure physical things. As their
diagnostics computers get better, their tools help to alleviate a lot of the
work they use to do by hand, but they still need to be able to fasten a bolt
with a certain measure of torque.

>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>
>
>

Next message: Benjamin Peterson: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Previous message: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 14:11:32 EST