Re: Ellipsis

From: Asmus Freytag (
Date: Sun Jan 22 2006 - 02:53:40 CST

  • Next message: Philippe Verdy: "Notation of very large numbers"

    On 1/21/2006 12:40 AM, Jukka K. Korpela wrote:

    > The Unicode standard says (in 2.3):
    > "Compatibility characters are those that would not have been encoded
    > except for compatibility and round-trip convertibility with other
    > standards."
    > We can take this position strictly and therefore regard HORIZONTAL
    > ELLIPSIS as included for such reasons only. But that would be somewhat
    > inconvenient position, since there is seldom any good way way to
    > create spaced ellipsis dots except by using the HORIZONTAL ELLIPSIS
    > character.
    You can take this position, and with lots of intelligence in your layout
    system you might parse and recognize sequences of periods for this or
    the the other.

    However, as you note yourself, it would be inconvenient. But to
    understand how inconventient you need to consider the whole range of
    applications for this charcters. For exameple, the Japanese ellipsis is
    unified with the horizontal ellipsis. When displayed, it is raised
    compared to the ordinary one, and in practice, it has proven quite
    difficult to recognize when the ellipsis character should be shown in
    one style or the other. Therefore I am not at all hopeful that any
    position that puts a further burden on a layout systems, viz. to
    recognize which in a series of periods are intended to represent an
    ellipsis, would be of any more benefit to the user of the standard.

    Therefore, constructing the ellipsis as a compatibility character in
    that strict sense is fraught with problems. I think what we have here is
    a compatibility decomposable character which, on closer inspection,
    turns out not to be a compatibility character.

    A similar issue exists with the one-dot leader, which is unified in the
    standard with an Armenian punctuation character, belying any attempt to
    classify it as "only encoded for compatibility". Overall, there exist in
    the standard vestiges of the original, necessarily somewhat simplistic
    model that the original designers of the Unicode standard brought to
    their task. For them, a compatibility character was a classification
    that existed in perfect black-and-white clarity.

    Over time, Unicode grew to encompass mathematical notation, phonetic
    characters, and a number of other things, while at the same time
    freezing the definition of compatibility decomposable character (by
    fixing the decompositions). Because of this, the actual nature of a
    character as compatibility character became at once dependent on the
    type of text in which it is used, and no longer well-aligned with the
    formal definition of compatibility decomposable character.

    [Just so no-one misconstrues my position: the status of many
    compatibility characters, such as the Arabic positional forms, or
    vertical forms, are definitely *not* in question.]

    The characters that represent repetitions of the same base element,
    whether the ellipsis, or the quadruple integral, take a special
    position. By providing the multigraph character, Unicode allows the
    author to unambiguously state his or her intention of grouping. At the
    same time, the sequence of elements serves both as a fallback
    representation as well as a natural way to input some of them. I think
    that's a strength of the standard, but to use it, it's necessary to
    recognize that some 'compatibility characters' are in fact widely used
    as if they were ordinary characters, and have no longer owe their
    existence solely to 'legacy' mappings.


    PS: I noticed that in this entire discussion, the fact that Unicode must
    support not only English usage, but other styles and conventions as well
    seemed to have been forgotton as everyone rushed to take a stance on the
    arcana of American English (academic) usage.. Rather than relying on a
    single authority to "know" which dot in a four dot sequence is the
    period, having an ellipsis character and a full stop as separate
    character is able to support both conventions, even though only one of
    them might be preferable in English. (We simply do not know all the
    alternative conventions).

    This archive was generated by hypermail 2.1.5 : Sun Jan 22 2006 - 02:55:45 CST