From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Nov 24 2004 - 16:00:58 CST
At 04:36 AM 11/24/2004, Peter Kirk wrote:
>I understand that the proposed INVISIBLE CHARACTER was rejected at the
>recent UTC meeting. I presume that the intention is that NBSP should be
>used instead.
At the moment, NBSP is the only sanctioned base character without 'ink'.
>There are cases of words which start with spacing combining marks, for
>which there are no separate Unicode characters. For example, there are
>some unusual biblical Hebrew word forms (Ketiv consonants with Qere
>vowels, the forms printed in Hebrew Bibles) which start with spacing
>combining marks. For some examples (in fact this is intended to be an
>exhaustive list of such words), see
>http://www.qaya.org/academic/hebrew/Ketiv-Qere-difficult.pdf, the "blended
>forms" column of rows with the note "point before word".
>
>This UTC decision leaves is in a situation in which such words need to be
>represented in Unicode with NBSP and combining marks at the start of a
>word. Does this lead to problems with HTML, XML etc? Are there cases in
>which this word initial NBSP will be combined with a preceding word space,
>and so the intended word spacing and break opportunity (before the NBSP)
>may be lost?
The sequence SPACE NBSP *does* not allow a break after the SPACE under the
line breaking rules we publish in UAX#14.
The common usage in HTML, is to use one or more NBSP followed by SPACE to
mark a wider space, that allows a break at the end. NBSPs are not coalesced
with other spaces.
>In the Hebrew case, it is probably necessary to precede the NBSP with RLM
>to ensure that the NBSP and combining mark are taken with the rest of the
>word as right-to-left. Does this inserted RLM affect the situation with
>HTML, XML etc?
You are always free to surround the NBSP with other format characters, such
as RLM or ZWSP, to tailor whatever behavior those format characters affect.
A./
This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 16:02:40 CST