From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Jan 23 2006 - 02:36:23 CST
On Sun, 22 Jan 2006, Asmus Freytag wrote:
> Therefore, constructing the ellipsis as a compatibility character in that
> strict sense is fraught with problems. I think what we have here is a
> compatibility decomposable character which, on closer inspection, turns out
> not to be a compatibility character.
The problem with this way of thinking is that the Unicode standard defines
the term "compatibility character" so that all compatability decomposable
characters are included. Apparently, this definition needs some tuning.
Since the term "compatibility character" is not rigorously defined
and is not used in a crucial way in other definitions, it can be tuned and
even changed, I suppose. I could even be dispensed with, in favor of
other, less problematic terms
> Overall, there exist in the
> standard vestiges of the original, necessarily somewhat simplistic model that
> the original designers of the Unicode standard brought to their task. For
> them, a compatibility character was a classification that existed in perfect
> black-and-white clarity.
The present-day problem with this is that as people learn about Unicode
and start using it, they try to find simplicity and exactness - and they
pay attention to remnants of the black-and-white clarity. They may even
become more papal than the pope, if you allow the expression. And as
information is disseminated from people who have actually read the Unicode
standard to people who read second or third hand information about it,
things tend to get simpler at the cost of correctness (i.e., simplistic).
Therefore, at least the standard should describe the current position
a bit better.
> Because of this, the actual nature of a character as
> compatibility character became at once dependent on the type of text in which
> it is used, and no longer well-aligned with the formal definition of
> compatibility decomposable character.
Perhaps the concept "compatibility character" could some day be declared
historical. Instead, the Unicode standard, or other standards, could
declare characters as not recommended for use in particular contexts or
for particular purposes.
> [Just so no-one misconstrues my position: the status of many compatibility
> characters, such as the Arabic positional forms, or vertical forms, are
> definitely *not* in question.]
They might be declared as not recommended in general, in some suitable
formulation that is sufficiently far from declaring them as deprecated and
sufficiently strong to be relevant.
> The characters that represent repetitions of the same base element, whether
> the ellipsis, or the quadruple integral, take a special position. By
> providing the multigraph character, Unicode allows the author to
> unambiguously state his or her intention of grouping.
On similar grounds, a four-dot ellipsis character might be justifiable.
> PS: I noticed that in this entire discussion, the fact that Unicode must
> support not only English usage, but other styles and conventions as well
> seemed to have been forgotton as everyone rushed to take a stance on the
> arcana of American English (academic) usage.
I previously mentioned that some languages may use unspaced points.
In theory, a formatting program might use metainformation about language
to decide whether a horizontal ellipsis character (or a sequence of three
full stop characters) should be rendered as spaced or unspaced (and with
which spacing). But in practice, that would put too much burden on the
higher levels when a simple distinction can easily be expressed at the
character level.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Jan 23 2006 - 02:37:43 CST