L2/07-398

Date: October 18, 2007
Source: Mark Davis
Subject: Word/Sentence punctuation property recommendations

----------------------

Document 07-370 discusses certain property issues. This document presents a series of recommendations based on that document and subsequent discussion in the meeting.

1. Remove FE13 ( ︓ ) PRESENTATION FORM FOR VERTICAL COLON from the property value Word_Break=MidNum. This is clearly an oversight; we explicitly remove COLON and just missed its compatibility equivalent.

2. Allow certain characters to "bridge" both numeric and alphabetic words. That is, if these characters are between digits or between alphabetic characters, they continue numeric and alphabetic words.

2A. Add a property value Word_Break=MidNumLet, with the following characters (these will be removed from other property values):

0027 ( ' ) APOSTROPHE
002E ( . ) FULL STOP
2018 ( ' ) LEFT SINGLE QUOTATION MARK
2019 ( ' ) RIGHT SINGLE QUOTATION MARK

2024 ( ․ ) ONE DOT LEADER
FE52 ( ﹒ ) SMALL FULL STOP
FF07 ( ' ) FULLWIDTH APOSTROPHE
FF0E ( . ) FULLWIDTH FULL STOP

In the text of the document, call out the last four characters as an open issue, requesting public feedback. The text would look something like:

Open Issue: The following characters have been tentatively added to MidNumLet for Unicode 5.1. As of Unicode 5.0, there were already compatibility equivalents of characters in MinNum and MidLetter, but the lists were not complete. These characters add compatibility equivalents to those characters that "bridge" numeric and alphabetic words. The inclusion of these characters only has an effect if they are surrounded by either numbers or alphabetic letters. In particular, this change has no effect if these characters are adjacent to ideographs.
 

2B. Make the following changes to the rules, to allow these characters to bridge both alphabetic and numeric words:

3. Add the following characters to Word_Break=MidLetter

0387 ( · ) GREEK ANO TELEIA
FE13 ( ︓ ) PRESENTATION FORM FOR VERTICAL COLON
FE55 ( ﹕ ) SMALL COLON
FF1A ( : ) FULLWIDTH COLON

Add an Open Issue to the document for these characters, along the above lines. Include the information that while  0387 ( · ) GREEK ANO TELEIA or 00B7 ( · ) MIDDLE DOT (its compatibility equivalent) may be used as a semicolon in Greek, like COLON it is safe to allow either one within words, since in the use as semicolon there would not be a letter immediately following.

4. Add the following characters to Word_Break=MidNum
 
066C ( ٬ ) ARABIC THOUSANDS SEPARATOR

FE50 ( ﹐ ) SMALL COMMA
FE54 ( ﹔ ) SMALL SEMICOLON
FF0C ( , ) FULLWIDTH COMMA
FF1B ( ; ) FULLWIDTH SEMICOLON
 
Add an Open Issue to the document for the last 4 characters, along the lines of the above.

5. Add the alternative forms of full stop to SentenceBreak=Aterm, ending up with the following.

002E ( . ) FULL STOP
2024 ( ․ ) ONE DOT LEADER
FE52 ( ﹒ ) SMALL FULL STOP
FF0E ( . ) FULLWIDTH FULL STOP
 
 Add an Open Issue as an open issue, as above.

6. Add an informative note that some or all of following characters may want to be tailored to be in MidNum.

0020 ( ) SPACE
00A0 (   ) NO-BREAK SPACE
2007 (   ) FIGURE SPACE
2008 (   ) PUNCTUATION SPACE
2009 (   ) THIN SPACE
202F (   ) NARROW NO-BREAK SPACE