L2/12-283
Source: Mark Davis
Subject: Handling fake Gershayim and Geresh in Hebrew words (UAX #29)
Date: 2012/07/29
	Proposed Change
	
	
	Create a PRI for the following proposed 
	change to UAX #29 in 6.2.1.
	
	
	Accommodate the use of " and ' in 
	default Hebrew word break. The changes would consist of the following:
	
	
	1. Create a property value for 
	Hebrew_Letter (HLetter), for Single_Quote (SQuote), and Double_Quote (DQuote).
	
	
	2. Add rules:
	
		- 
		HLetter × SQuote
		 
		- 
		HLetter × DQuote HLetter
		 
		- 
		HLetter (SQuote | DQuote) × HLetter
 
	
	
		3. Change every other rule as 
		follows:
	
		
			- 
			ALetter to be (ALetter | HLetter)
			 
			- 
			Mid_Num_Let to be (Mid_Num_Let | SQuote)
 
		
	 
 
	Background
	
	
	When writing Hebrew, it is common 
	practice to use ASCII " and ' instead of the correct characters. However, 
	while those behave correctly in the default Unicode line break, they don't 
	behave correctly in the default Unicode word break. The problem arises when 
	there is Hebrew text in the midst of another language, so the other 
	language's word break is being used.
	
	
	
	
	There are 
	pros and cons to this change. It is a very language-specific change, and we 
	certainly don't want to push all the language-specific changes down to 
	root. On the other hand, other than some minor additional complexity, it 
	shouldn't hurt any other locale; the script makes this unambiguous.  So we'd 
	like a PRI item for this to consider whether or not the change would be 
	warranted.
	
	
	
		The problem arises in these two 
		cases:
	
		
		
		
			While the following case works fine already, and needs no change.
		
		
			
				The Geresh-equivalent (') can occur medially and finally, while 
				the Gershayim-equivalent (") can occur only medially.