From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Mar 30 2004 - 11:15:05 EST
At 04:28 PM 3/29/2004, Kenneth Whistler wrote:
> > I will say again as I have said before - but the above (and what I
> > snipped) is extra evidence for it - that what is broke ... is
> > the rule that the isolated (generally spacing) form of a combining mark
> > should be formed by SPACE or NBSP followed by the combining mark.
>
>This has been the *intent* of the standard since its inception in
>1989.
>
> > There
> > are many good reasons for not using SPACE for this, including default
> > behaviour like inserting line breaks immediately after SPACE.
>
>Nope. UAX #14 specifies the following regarding SPACE followed by
>combining marks:
>
>"If U+0020 SPACE is used as a base character, it is treated as AL
>instead of SP."
This is an unfortunate typo in UAX#14. The correct statement is:
"If U+0020 SPACE is used as a base character, it is treated as ID
instead of SP."
see the description of these issues in the rules section of the UAX
which are quite explicit:
LB 7a In all of the following rules, if a space is the base character for
a combining mark, the space is changed to type
<http://www.unicode.org/reports/tr14/#ID>ID. In other words, break before
<http://www.unicode.org/reports/tr14/#SP>SP
<http://www.unicode.org/reports/tr14/#CM>CM* in the same cases as one would
break before an <http://www.unicode.org/reports/tr14/#ID>ID.
Treat SP CM* as if it were ID
As stated in [<http://www.unicode.org/reports/tr14/#Unicode>Unicode],
Section 7.7 Combining Marks, combining characters are shown in isolation by
applying them to either U+0020 SPACE (SP) or U+00A0 NO- BREAK SPACE (NBSP).
The visual appearance is the same, but the line breaking result is
different. Correspondingly, if there is no base, or if the base character
is <http://www.unicode.org/reports/tr14/#SP>SP,
<http://www.unicode.org/reports/tr14/#CM>CM* or
<http://www.unicode.org/reports/tr14/#SP>SP
<http://www.unicode.org/reports/tr14/#CM>CM* behave like
<http://www.unicode.org/reports/tr14/#ID>ID.
>This means that a combining character sequence of this type is treated
>as a unit for the purposes of line breaking, and this overrides the
>behavior otherwise of SPACE to be treated as a line break
>opportunity.
There's never a line break opportunity between a SPACE and a combining
mark, but
since SP is treated like an ID (ideopgrahic line breaking class), there are
break opportunities *before* the SP that will not be there if an NBSP is used.
>Of course UAX #14 only spells out default behavior,
>but then "default behaviour" is what was claimed just above.
>
> > Using NBSP rather than SPACE has several advantages, and has long been
> > specified in Unicode, although not widely implemented. It is less likely
> > to occur accidentally. But it has disadvantages, especially that it will
> > always be a spacing character, whereas for display of isolated Indic
> > vowels no extra spacing is required.
>
>NBSP is not a fixed-width space.
Correct. Somewhere in the standard, we should point out that using a
space/NBSP as base character does not require these spaces to be at the
same widths as elsewhere in the text, but that they can (and should) be
adjusted to best serve this 'base character' function.
A./
This archive was generated by hypermail 2.1.5 : Tue Mar 30 2004 - 12:07:22 EST