L2/99-037 From: Michel Suignard [michelsu@microsoft.com] Sent: Wednesday, January 27, 1999 1:05 AM To: Multiple Recipients of Unicore Subject: FW: Soft space Result on my query to Maurice Bauhahn about the proposed addition of the soft space for version 3.0. I am finding his explanation reasonably convincing. Make sure to include him as well on your feedback and comments. Michel -----Original Message----- From: Maurice Bauhahn [mailto:bauhahnm@clara.net] Sent: Tuesday, January 26, 1999 9:58 PM To: Michel Suignard Subject: Re: Soft space Hello again. Thank you very much for forwarding this request and for revealing the misunderstanding that prevails regarding SOFT SPACE. In languages without distinct word demarcation (such as Khmer) Zero Width Space U+200B has another and distinct use: separating the words without any visual spacing. ZWS is absolutely necessary...and very frequently used. The Soft Space has a totally different use: Allowing proper phrasing of text when moving justified text between differing column widths. A phrase is usually composed of multiple words. Each of the words within a phrase would be separated by a ZWS. It is not proper form to let an algorithm arbitrarily decide which of those ZWSes should be given visual space...and it is even worse to add a little space to each of them! Just as it would be ridiculous to let a paragraph formatting algorithm insert commas in an English text, it would be ridiculous to let another paragraph formatting algorithm insert visual space [as opposed to widening existing space] in Khmer (or other Indic language) text. We are a long way from having the artificial intelligence necessary to handle that task (especially in such minority languages). Now, of course, there is a Unicode character SPACE U+0020 which can be used in Indic languages to insert a phrase break. This is frequently used...too frequently! After many years of experience in this I have learned that this one character is woefully inadequate, for when text is flowed into new column widths...editors spend hours on the text taking out and adding SPACEs (phrase breaks). It would be far better for the originator of text to chose the phrasing priorities, putting a SPACE where a deliberate break in thought would occur and a SOFT SPACE where minimal damage would be done to the phrasing were it to be expanded for justification purposes. A case could even be made for multiple levels of SOFT SPACE (but I'm having problems enough with this one level;-(). Some might object that the justification problem in these contexts could be eliminated by algorithmic microspacing between non-dependent characters. To a minor degree this is true. Languages written without ligatures can have their characters generously microspaced. This is not the case in heavily ligatured scripts (such as Khmer). In conclusion...SOFT SPACE is VERY much needed. It is distinct from ZWS and SPACE. It facilitates the reflowing of text while preserving the meaning of that text. Please do get back to me if these points are not sufficiently understandable (or convincing;-)). I would be happy to bring this discussion out into the Unicode mailing list...with your permission. Cheers, Maurice Bauhahn Michel Suignard wrote: > > Hello Maurice, remember our conversation in London about the extra space > character? When presenting this request for the new character in the Unicode > Technical Committee (UTC) it was brought to my attention that the Zero Width > Space (U+200B) was designed for that purpose and I couldn't find any > justification to have both the Soft Space (our new character) and the ZWSP. > You can read the Unicode book page 6-68 for more explanation about the ZWSP. > I have put an extract here: > Zero-width characters can be used in languages that have no visible word > spacing in order to represent word-breaks, such as in Thai or Japanese. > There are several varieties of zero-width spaces, the standard one is the > word-break space U+200B ZERO WIDTH SPACE, used to add soft word breaks in > language without word spaces.... > > Could you comment on this and try the difference between what the soft space > and the ZWSP would be (if any). > Your comment would be highly appreciated as by default, the US will ask for > removal of this character. > Best regards, > > Michel