Re: Proposal to encode an EXTERNAL LINK symbol in the BMP

From: Jukka K. Korpela (
Date: Mon Jul 24 2006 - 03:24:42 CDT

  • Next message: Séamas Ó Brógáin: "Re: Proposal to encode an EXTERNAL LINK symbol in the BMP"

    On Sat, 22 Jul 2006, Curtis Clark wrote:

    > I think it's a good idea.

    The idea of creating a character that can be used as a marker for external
    links is interesting, but there are many problems with it.

    The description of existing practices shows that some web authors use
    images of a certain type to mark external links as external. This does not
    demonstrate existing usage of a _character_, since authors use rather
    varying designs. Picking up just one class of designs seems premature.
    Most web pages do not distinguish external links from internal links using
    any markers.

    The distinction between external and internal links can be important,
    though often it's important just to site management, not to users, who
    surf around the web and don't pay much attention to the "site" concept.
    As some examples in the draft proposal indicate, the distinction is
    often made on web pages as a formality, more or less: indicating a link as
    external is a disclaimer of a kind, and mostly pointless. Sites that wish
    to continue such practices will hardly be interested in using any
    "standard character" as a marker, especially since the font support to the
    character will be virtually nonexistent for a fairly long time. (What
    matters is support in fonts that web users have in their computers, and
    such things change slowly, even if good fonts were available for free.)

    The concept "external" is somewhat vague in this context. Apparently it
    means "external to the current site", but what is a "site", really? If we
    expect that the new character will become widespread, or even "standard"
    marker, it should perhaps have more definite meaning.

    The most important question, however, seems to be whether it is even
    desirable to have the internal vs. external distinction to be made at the
    document level, and specifically at the character level within a document.
    Using a marker - whether an image or a character - makes the distinction
    part of the document's content in a manner that prescribes a particular
    visual rendering. This is contrary to modern structured and
    device-independent approach. The distinction, if relevant, is primarily a
    metadata issue, and an attribute (in hypertext markup, e.g. HTML or XML)
    could be used for the purpose. This would leave it to user agents to
    render the distinction in a manner suitable for a particular browsing
    situation. If a visual marker is used, it would most appropriately be an
    image specific to the browser, i.e. part of the browser's user interface.
    Thereby, it would perhaps not be suitable to treat it as a _character_,
    even though it may appear in the midst of text. (External links could also
    be indicated by the use of colors, for example, or they might look similar
    to other links until the cursor is moved over the link.)

    I think it basically belongs to the scope of the World Wide Web Consortium
    to discuss whether a uniform, universal symbol is a desirable way to
    indicate a link as external and whether the symbol should be part of a
    document or part of a user agent's interface. Only after resolving that
    could we adequately discuss whether that symbol should be encoded as a

    > One quibble: it is "web page" or "World Wide Web page", not "Internet page".

    The difference between external and internal links can be relevant on
    intranet pages, too, and in documents such as "standalone" HTML, XML,
    Word, PDF, etc., documents. By "standalone" I mean that the document is
    primarily for offline viewing but might also be used in an environment
    where external (web) links would work. In such situations, the indication
    of external links can be much more important than in normal web usage.

    Basically, the distinction relates to any data format where a link
    (hyperlink) concept is meaningful and a link may refer to something

    > 1. It will take a while for such a character to find its way into ubiquitous
    > fonts, so web developers will need to use the graphic for a while longer. I
    > don't see this as an argument against; *without* the character, they will
    > have to wait forever.

    I'm afraid that if the character were introduced, it would only be used
    by a small minority of web authors, among the minority that marks external
    links as external in the first place. In effect, it would be yet another
    (and rarely used) symbol acting as external link marker, rather than a
    "standard" marker.

    As an aside, I think the name EXTERNAL LINK would not be quite adequate.
    The name would suggest that the character _is_ a link (as it might
    actually be, though more often it would be either part of the link text or
    be more descriptive. Perhaps even HYPERLINK instead of LINK, since the
    word "link" as such is rather polysemic.

    > 2. A graphic can have alternate text, such as "external link" for users who
    > can't view images.

    We have the unfortunate situation that in HTML, an image can have
    alternate text but there is no corresponding construct for a character.
    There is no way of specifying that if a particular character cannot be
    displayed (or otherwise rendered) by a user agent, then a particular
    replacement string (which would presumably contain "safe" characters only)
    be rendered instead. This is one reason why authors so often use images
    for symbols that actually exist in Unicode as characters, such as simple

    > It will take a while for screen readers to be programmed
    > to have a pronunciation of the new character (I'm not sure how JAWS, the
    > commonest screen reader in the United States, deals with symbol characters).

    That's really an understatement. I'm afraid that speech-based user agents
    usually deal with a fairly limited character repertoire (such as Basic
    Latin, i.e. Ascii, perhaps with Latin 1 Supplement or some other
    addition). If they were expected to deal with a considerably wider
    repertoire, then the only sensible approach, for most characters, would be
    to spell out the name of the character. However, using the Unicode name as
    such is not generally a good idea, though perhaps the only possible way at
    present. First, we know that some of the names are misleading. Second, the
    name should appear in the language of the document, so localized names
    would be needed, for each language supported by the program. Third, even
    the localized name might be misleading in a particular context, since it
    would relate to the character in general, not to its particular usage.
    (For newly introduced characters with fairly simple semantics, like the
    proposed one, this wouldn't be much of a problem, but I wanted to remind
    of this general problem.)

    > But again, this would eventually happen, and during the period after the
    > availability of fonts, and before updates to screen readers, web developers
    > could use the "title" attribute to identify the character.

    The "title" attribute is an unreliable method. Although many speech-based
    user agents are able to read its value, they are typically configured not
    to do that by default. On visual user agents, the "title" attribute does
    not affect the normal rendering at all; its value may be displayed as a
    "tooltip" on mouseover, so it _may_ solve the puzzle _if_ the user sees a
    suitable symbol of a missing glyph or unrecognized character _and_ the
    user suspects that moving the pointer over the mystery may reveal

    Jukka "Yucca" Korpela,

    This archive was generated by hypermail 2.1.5 : Mon Jul 24 2006 - 03:37:27 CDT