Re: Non-breaking hyphens and web browsers

From: Otto Stolz (
Date: Tue Nov 09 1999 - 11:02:27 EST

Am 1999-11-06 um 19:41 h hat Ben Yenko-Martinka geschrieben:
> While the en-dash "–" [...] appears to work fine in Netscape,
> it allows wrapping in Microsoft Internet Explorer.

This is a character from a proprietary codepage, cf.
Most probably, it will be understood only in MS-Windows systems --
not in Unix boxes, not on Macs, probably not even on PCs running DOS,
Linux, or OS/2 (I haven't tried these latter, though).

In HTML, up to Version 3.2, CP 1252 (including the character you were
trying to use) is not legal; there is no way to tell the receiving
side about the intended meaning of the byte 96 (hex). The only legal
code page for transmitting HTML (in those versions) is Latin-1, cf.

This means that you are not allowed to use, in HTML (pre 4.0) sources,
bytes 128 (80, in hex: the Euro currency symbol) through 159 (9F, in
hex: the uppercase Y with dieresis) from your Windows codepage.

The remaining ranges of CP 1252 are compatible with Latin-1, aka ISO

Since HTML 4.0, you may include any UCS character in your HTML source,
provided you announce the encoding properly, cf.
- recent contributions to the Unicode List,
  Subject: "How to use Unicode on XML/HTML pages"
- standard: <>,
- example: <>,
  which contains a dash (rather than the hyphen you were looking for),
  in its H1 header.

> Furthermore, while Homesite (my web design tool) makes this code
> available for en-dash, it appears in the range of unassigned codes in
> Homesite's own "complete" Character Entity Reference for ISO Latin-1.
> In that reference, the en-dash is listed with two allowable codes:
> "&#8211;" and "&ndash;".

A web design tool must
- accept input in the local encoding (otherwise you would not be able
  to enter text via your keyboard, or other standard means),
- produce legal HTML (otherwise your pages would not be understood

In your example,
- you enter your text in CP 1252, where the n-dash has the encoding 96 (hex),
- the HTML 4.0 source may contain numerical character refereneces based
  on UCS code positions (i. e. "&#8211;" or "&#x2013", for the n-dash),
  or standard character entities (e. g. "&ndash;"), as discussed in the
  reference given above.
Hence, what you report is a perfect sensible behaviour for a web design
tool running on a (western) MS-Windows system; just the documentation may
be misleading, as the n-dash does not belong under the heading of "ISO

> These, however, are recognized by Microsoft IE but not by Netscape (at
> least not in the Arial font which they are in IE).

Netscape (up to version 4.51, at least) has a bug: while, according to the
standard (cf. reference given above), an HTML 4.0 source in any encoding
(e. g. Latin-1) may contain arbitrary numeric character references, Net-
scape can display the full UCS character set *only if* UTF-8 is chosen
as the transfer encoding. This may be the reason of your problem, though
my copy of Netscape can display an m-dash on the PC, even in a Latin-1
encoded page <>.
If so, you should tell your web design tool to produce UTF-8, rather
than Latin-1, encoded HTML.

Another reason for your problem may be that Netscape is not aware of the
characters available in the various Arial fonts you have installed on your
system. You need fonts comprising the UCS encoding tables. These come
with recent MS-Windows versions, or can be downloaded from Microsoft, cf.


Best wishes,
   Otto Stolz

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT