L2/16-043R Line Breaking Fixes for PR (Prefix Numeric) and PO (Postfix Numeric) characters Roozbeh Pournader (Google) January 27, 2015 Proposal ======== In UAX #14, change rules LB23 and LB24 from: LB23 Do not break within ‘a9’, ‘3a’, or ‘H%’. ID × PO (AL | HL) × NU NU × (AL | HL) LB24 Do not break between prefix and letters or ideographs. PR × ID PR × (AL | HL) PO × (AL | HL) to: LB23 Do not break between digits and letters: (AL | HL) × NU NU × (AL | HL) LB23a Do not break between prefix and ideographs, or between ideographs and postfix. PR × ID ID × PO LB24 Do not break between prefix/postfix and letters, or between letters and prefix/postfix. (PR | PO) × (AL | HL) (AL | HL) × (PR | PO) Background ========== UAX #14, in its definition of the PR (Prefix Numeric) class of characters, says that "[...] the line breaking algorithm, by default, does not break between PR and numbers or letters on either side." Later, in its definition of the PO (Postfix Numeric) class, it repeats: "the line breaking algorithm by default does not break between PO and numbers or letters on either side." But this is actually not reflected in the explicit rules! Additionally, explicit rules of LB23 and LB24 seem to be different from what their description specifies. LB23 is specially weird when it talks about not breaking within ‘H%’ and then doesn't have any rules that apply to the case. (The ‘H%’ inconsistency dates back to an unexplained change in Unicode 3.1.0 that changed one of the subrules in what was then LB 17 from "AL × PO" to "ID × PO".) The inconsistency between the intent of the standard and its actual rules specified have led to bugs in various products, especially those depending on ICU for line breaking. For example, UAX #14 formal rules currently allow a line break in the middle of currency symbols such as CA$ for Canadian dollars or JP¥ for Japanese yen. (I wrote about this issue to the UTC in February 2015, and I was given an action, 143-A35 to propose an update to UAX #14 for a fix in Unicode 9.0.) The same problem exists for various artists that use a stylized name, such as "Travi$ Scott", "Ke$ha", "Curren$y", and "A$AP Rocky". This should be fixed so that the standard would both be consistent and correct. For the record, here are the characters that presently have the line breaking property of PR or PO: 0024;PR # Sc DOLLAR SIGN 0025;PO # Po PERCENT SIGN 002B;PR # Sm PLUS SIGN 005C;PR # Po REVERSE SOLIDUS 00A2;PO # Sc CENT SIGN 00A3..00A5;PR # Sc [3] POUND SIGN..YEN SIGN 00B0;PO # So DEGREE SIGN 00B1;PR # Sm PLUS-MINUS SIGN 058F;PR # Sc ARMENIAN DRAM SIGN 0609..060A;PO # Po [2] ARABIC-INDIC PER MILLE SIGN..ARABIC-INDIC PER TEN THOUSAND SIGN 060B;PO # Sc AFGHANI SIGN 066A;PO # Po ARABIC PERCENT SIGN 09F2..09F3;PO # Sc [2] BENGALI RUPEE MARK..BENGALI RUPEE SIGN 09F9;PO # No BENGALI CURRENCY DENOMINATOR SIXTEEN 09FB;PR # Sc BENGALI GANDA MARK 0AF1;PR # Sc GUJARATI RUPEE SIGN 0BF9;PR # Sc TAMIL RUPEE SIGN 0D79;PO # So MALAYALAM DATE MARK 0E3F;PR # Sc THAI CURRENCY SYMBOL BAHT 17DB;PR # Sc KHMER CURRENCY SYMBOL RIEL 2030..2037;PO # Po [8] PER MILLE SIGN..REVERSED TRIPLE PRIME 20A0..20A6;PR # Sc [7] EURO-CURRENCY SIGN..NAIRA SIGN 20A7;PO # Sc PESETA SIGN 20A8..20B5;PR # Sc [14] RUPEE SIGN..CEDI SIGN 20B6;PO # Sc LIVRE TOURNOIS SIGN 20B7..20BA;PR # Sc [4] SPESMILO SIGN..TURKISH LIRA SIGN 20BB;PO # Sc NORDIC MARK SIGN 20BC..20BD;PR # Sc [2] MANAT SIGN..RUBLE SIGN 20BE;PO # Sc LARI SIGN 20BF..20CF;PR # Cn [17] .. 2103;PO # So DEGREE CELSIUS 2109;PO # So DEGREE FAHRENHEIT 2116;PR # So NUMERO SIGN 2212..2213;PR # Sm [2] MINUS SIGN..MINUS-OR-PLUS SIGN A838;PO # Sc NORTH INDIC RUPEE MARK FDFC;PO # Sc RIAL SIGN FE69;PR # Sc SMALL DOLLAR SIGN FE6A;PO # Po SMALL PERCENT SIGN FF04;PR # Sc FULLWIDTH DOLLAR SIGN FF05;PO # Po FULLWIDTH PERCENT SIGN FFE0;PO # Sc FULLWIDTH CENT SIGN FFE1;PR # Sc FULLWIDTH POUND SIGN FFE5..FFE6;PR # Sc [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN Further Reading and Watching ============================ * "Travi$ Scott And The Value Of A Dollar Sign Name", Forbes, February 2014: http://www.forbes.com/sites/natalierobehmed/2014/02/19/travi-scott-and-the-value-of-a-dollar-sign-name-2/ * "Ke$ha: The Story of the $", Funny Or Die, November 2010: http://www.funnyordie.com/videos/f6f8af76f6/ke-ha-the-story-of-the