From: Philippe Verdy (firstname.lastname@example.org)
Date: Fri Aug 08 2003 - 11:54:36 EDT
On Tuesday, August 05, 2003 1:52 AM, Kenneth Whistler <email@example.com> wrote:
> > > The carrier for a combining mark that is to display in isolation
> > > without a base character is U+0020 SPACE. If you want to also
> > > indicate the absence of a line break opportunity, then the
> > > carrier is U+00A0 NO-BREAK SPACE (NBSP).
> > >
> > Neither of these is appropriate to the case I have in mind
> > (described in greater detail below) as they are not zero width and
> > therefore give an unwanted indent at the start of a line.
> Of course, because the whole point of this convention is to display
> a non-spacing mark in isolation, not applied to a base character.
> > U+200B ZERO WIDTH SPACE might be
> > appropriate, but this has the problem that it is a break
> > opportunity, which is not always appropriate.
> U+200B ZERO WIDTH SPACE is not appropriate, for the same reason
> the U+FEFF (or U+2060) is not appropriate: The Standard does
> not specify the display of non-spacing marks on it as a means
> of showing the marks without base characters. And, as you indicate,
> U+200B (but also U+FEFF and U+2060) are implicated in the control
> of line break opportunities. They are certainly not defined
> as glyph display anchors or some such.
Here I disagree: ZWS is a white-space, not a format control, and thus it
has a glyphic and semantic identity by itself (unlike ZWNBSP or WJ).
So ZWS clearly qualifies as a base character, and is certainly better
(conceptually and per its breaking properties) than the standard ASCII
space which has an implied minimum width (which may be too large
to be used as a holder for a tiny diacritic like a dot above, or even an
200B;ZERO WIDTH SPACE;Zs;0;BN;;;;;N;;;;;
When we speak about combining sequences, they are already
supposed to expand the width or height of a base character to
which it applies, so ZWS despite being zero-width itself, does
not make this property inherited to the combining sequence which
For me, the best two candidates for holders of isolated diacritics
are ZWS (if breakable before and after the combining sequence),
or WJ (if not breakable when the isolated diacritic must be used
within the same word without internal break opportunity).
However WJ is a control and does not fit well for the second
usage. Could there be another codepoint assigned that has
20CF;ZERO WIDTH SYMBOL;Sk;0;ON;<compat> 0020;;;;N;;;;;
i.e. being considered symbolic, not a whitespace, with
combining class 0 (not combining), and used as an
explicit base for a isolated spacing diacritic to never show
with a dotted circle? (note U+20CF is just a suggestion, as
it fits at end of the symbolic block used for currency symbols,
just before the "extended" combining characters block, and
because the U+02XX block where other "Sk" spacing
diacritics are defined is full).
The compatibility decomposition to a space is to make it
in sync with other compatibly decomposable spacing
The new character would allow to represent diacritics that currently
don't have a spacing counterpart, and use them as if they were letter
like. Let's look at a similar diacritic which currently has an existing
"precombined" spacing version:
00B4;ACUTE ACCENT;Sk;0;ON;<compat> 0020 0301;;;;N;SPACING ACUTE;;;;
This archive was generated by hypermail 2.1.5 : Fri Aug 08 2003 - 12:31:14 EDT