Re: Definition of character

From: Ken Whistler <>
Date: Wed, 13 Jul 2011 15:49:45 -0700

On 7/13/2011 1:23 PM, Jukka K. Korpela wrote:
> I don’t see that biologists use the word “life” in any confusing
> manner comparable to the Unicode confusion around “character.” “Life”
> isn’t really a central concept in biology, and its use in biology
> hardly differs much from everyday use. Defining “life” might be a
> problem to philosophers, politicians, etc., but not that much in biology.

Actually, I picked that example advisedly. First of all, to say that
"Life" isn't really a central concept
in biology is more than a little misleading. Biology is usually defined
as "the science concerned with
the study of life." The fact that "life" is hard to pin down exactly and
cannot really be used as
on an axiomatic definitional basis is part of the problem.

And to say it isn't used in "any confusing manner" is also
problematical. In fact, the definition
of what is and isn't life is a serious issue for virologists and
exobiologists, at least. If you
use the usual criteria for defining living organisms (metabolism,
homeostasis, growth,
response to stimuli, reproduction, and adaptation through natural
selection), viruses and
viroids fail on several of those criteria.

So a virus is to life, kind of like a control code is to a character. ;-)

> You might try “species” instead.

Nah. The point was about "What is it about?"

Character encoding is about characters. But if one tries to force too
clean a definition
on "character", one gets into trouble. As Asmus was at pains to point
out, the character
encoders are essentially engaged in an operational discovery process
"what characters there are". That in turn leads to a definition by
enumeration: What
characters are consists of the list of what characters there are.

One can then go back over the list looking for recurring attributes, in
an attempt to
organize and classify the resulting zoo. But the results tend not to
make any
axiomatic sense, both because of the complexity of writing systems
through history
(which Asmus alluded to), together with the fact that all kinds of
technical artifacts
got added to the zoo, many of which have little or nothing to do with
writing systems

Early on, the Unicode Standard tried to provide a clear scope for the
by enumeration of some of the characteroids that we didn't consider
for encoding as characters. But that list has been whittled away, as
musical symbols
were added, for example, and more recently large sets of pictographs and the
first set of symbols for a shorthand system. What it really comes down to
is that "characters" will end up being those "things" that somebody
persistent wants to embed as units in a digital plain text string, and
which the gatekeepers in
the character encoding committees consider not too crazy to standardize.

> But to get a more reasonable comparison, consider “force” and “energy”
> in physics. They are surely very different from the everyday meanings.
> When an ad says that some drink is “low energy,” it hardly makes much
> sense physically without clarification. But in physics, people need
> not worry about such issues. Physics does not deal much with things
> where the varying everyday meanings of “force” and “energy” could be
> confused with the physical meanings.

I considered physics for a comparison, too. For that matter: "matter",
"space", and "time".
One could argue the comparison, but it doesn't work as well, IMO. The formal
definitions are all embedded in mathematics in a way that "character" is

Although the mindboggling things that happen to "force", "energy", "matter",
"space" and "time" at the Planck scale do bring to mind the Unicode concept
of a "noncharacter". ;-)

> But in the Unicode Standard, in the discussion around it, and in
> applying it, uses of “character” in everyday sense are common and
> essential.

Yep, no quarrel there.

