Re: (TC304WG1.241) data for collation tests

From: John Clews (
Date: Sun Feb 09 1997 - 18:59:31 EST

In message <9702081203.AA20355@Unicode.ORG> unicode@Unicode.ORG writes:

[Alain La Bont/e'/]:

> >But in TC37 terminology, the expressions "word by word" or "character by
> >character" terminology is wrong as far as actual understanding of what is
> >going on is taken care of. I can affirm you that what they call "word by
> >word" is more "character by character" than the other method.

As Professor Joad would have said "It all depends on what you mean by a
word." (Here I am indicating my Anglocentricity and my age).

In the early days of ISO/IEC JTC1/SC2/WG3 (when bits of it were still - I
think - ISO/TC97/SC2/WG3 and even WG7?), Peter Fenwick came up with simple
useful definitions of character, word, paragraph, text, etc.

Word was defined - for text handling purposes - as a group of characters
bounded by a space (which would also included various white space
characters). At the time that seemed fairly uncontroversial.
In text handling in computers this is fine. Computers don't need to know
about complications like meaning - that's what humans are for.

[Michael Everson]:

> The use of the terms word-by-word and letter-by-letter (John Clews can
> probably tell us) goes back a long way in libraries with paper cards, I
> suspect. Certainly it did not originate with TC37 or Gavar=E9.

[John Clews]

JohnClews can definitely tell you, and can confirm Michael's point
exactly. In library school training, the term "word by word" was used to
decribe sorting where space was significant (greater than NULL) and
"letter by letter" to describe sorting where space was equivalenced to

Library catalogues were always word by word, and telephone directories
always letter by letter. Librarians could easily cope with telephone
directories, and telephone users also used libraries easily enough.

Both are valid methods, and it was also very easy to explain to end-users
(and for end-users to understand), either convention.

Some years after I was taught these things, computers came along in a big
way in libraries. All computerised catalogues and their users did exactly
what card catalogues did previously: sorted word by word (the easiest way
to explain it), and everybody in libraries (staff and users) continued to
live happily ever after.

John Clews

John Clews (Character Set Development)     tel: +44 (0) 1423 888 432
SESAME Computer Projects, 8 Avenue Road    
Harrogate, HG2 7PG, United Kingdom         email:

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT