ISO/IEC JTC 1/SC 2/WG 2  N 2268

                                                                                                                                                                                     Date: 2000-09-15

 

                    Universal Multiple-Octet Coded Character Set

                    International Organization for Standardization

                    Organisation internationale de normalisation

 

 

Document Type: Working Group Document

Title: SOFT HYPHEN and some other characters

Source: Kent Karlsson

Status: Expert Contribution

Action: For consideration by JTC 1/SC 2/WG 2 and JTC 1/SC 2/WG 3

Date: 2000-09-15

 

 

 

The text concerning SOFT HYPHEN in the ISO/IEC 8859 series (and in ISO/IEC 6937)  is unclear, and has been misinterpreted as disallowing SOFT HYPHEN if not immediately followed by a LINE FEED and/or CARRIAGE RETURN). See e.g.: http://www.hut.fi/~jkorpela/shy.html. This misinterpretation has been circulated as “the correct one” on one of the Linux mailing lists.

 

ISO/IEC 8859 says on SOFT HYPHEN: “A graphic character that is imaged by a graphic symbol identical with, or similar to, that representing hyphen, for use when a line break has been established within a word.”

 

The intent here is that the graphic symbol is to be used when “a line break has been established within a word, and that otherwise no graphic symbol is to be used.

 

The misinterpretation circulated is that the SOFT HYPHEN character itself should only be used if “a line break has been established within a word” and that line-break is then explicitly represented as a line feed (or carriage return).  The text in the 8859 series on SOFT HYPHEN is unclear, and allows for this unintended interpretation.  To avoid continued misinterpretation, the text on SOFT HYPHEN should be made clearer.

 

ISO/IEC 10646-1:2000 (annex H) contains no text at all on SOFT HYPHEN, nor on SPACE (8859 does), NO-BREAK SPACE (but on NARROW NO-BREAK SPACE), nor MONGOLIAN TODO SOFT HYPHEN.  Annex F cannot reasonably be extended to cover all characters needing “special handling”, but some are covered by 8859 in an unclear fashion.  Below are new texts for some special characters not already covered by annex H, or are unclear in the context of 8859.  These texts are suggested also for the 8859 series, as well as 6937 (for those of these characters that are included there).

 

 

 

 

 

Suggested new texts:

 

 

SPACE (0020):SPACE (SP) is a graphic character that has a visual representation consisting of the absence of a graphic symbol. It allows automatic line break after if not followed by another SPACE.

 

NO-BREAK SPACE (00A0):NO-BREAK SPACE (NBSP) is a graphic character, the visual representation of which consists of the absence of a graphic symbol, for use when an automatic line break just before or just after it is to be prevented in the text as presented.

 

HYPHEN-MINUS (002D):HYPHEN-MINUS allows an automatic line break to be established just after it only if it is both immediately preceded by a letter and immediately followed by a letter. HYPHEN-MINUS should be imaged by a graphic symbol identical with that representing HYPHEN when immediately preceded or immediately followed by a letter.HYPHEN-MINUS should be imaged by a graphic symbol identical with that representing MINUS otherwise.

 

HYPHEN (2010): HYPHEN allows an automatic line break to be established just after it.  HYPHEN is imaged by a graphic symbol.

 

NON-BREAKING HYPHEN (2011): NON-BREAKING HYPHEN is a graphic character, the visual representation of which is identical to that of HYPHEN. NON-BREAKING HYPHEN is for use as hyphen when an automatic line break just before or just after it is to be prevented in the text as presented.

 

SOFT HYPHEN (00AD):SOFT HYPHEN (SHY) allows an automatic line break to be established just after it (like ZERO WIDTH SPACE). SOFT HYPHEN is imaged by a graphic symbol identical with that representing HYPHEN when an automatic line break has been established just after it, or if it is directly followed by an explicit line break (including end-of-string). When an automatic line break has not been established just after it, nor is it followed by an explicit line break, the SOFT HYPHEN is not rendered and has zero width.

 

            Note:    In certain combinations, e.g., webb<SHY>besökare, the SOFT HYPHEN can

                        in addition suppress the letter following the SOFT HYPHEN when the SOFT

                        HYPHEN is not rendered (e.g.webbesökare).  Such behaviour is similar to

                        automatic ligature formation.

 

MONGOLIAN TODO SOFT HYPHEN (1806): MONGOLIAN TODO SOFT HYPHEN allows an automatic line break to be established just before it. MONGOLIAN TODO SOFT HYPHEN is imaged by a graphic symbol identical with that representing HYPHEN when an automatic line break has been established just before it, or if it is directly preceded by an explicit line break (including beginning-of-string). When an automatic line break has not been established just before it, nor is it preceded by an explicit line break, the MONGOLIAN TODO SOFT HYPHEN is not rendered and has zero width.