Re: UAX 29 questions

From: Karl Williamson <>
Date: Thu, 29 Jan 2015 22:25:14 -0700

On 01/29/2015 08:19 PM, Philippe Verdy wrote:
> 2015-01-29 19:52 GMT+01:00 Karl Williamson <
> <>>:
> Rule WB4 is
> "Ignore Format and Extend characters, except when they appear at the
> beginning of a region of text.".
> Not clearly stated, but it appears to me that the ZWJ must be
> considered here to be the beginning of a region of text, as we are
> looking at the boundary between it and the "A". No rule
> specifically mentions ALetter followed by an Extend, so by the
> default rule, WB14
> "Otherwise, break everywhere (including around ideographs)"
> All the text is targeted at finding candidate positions for breaks. It
> is not very clear that "ignore" is definitive and means that there
> cannot be any further breaks before the Format and Extend characters,
> except at beginng of text. So all the rest of rules is ignored, there
> was a match and you stop there; no break before;
> Any × (Format | Extend)
> This is confirmed in other rules that state the word "otherwise",
> including the last one (WB14) you quote which is explciitly not applicable.

I don't understand you here. I understand all the words, but I don't
see what you're trying to say. My claim is that there should be a rule:
as you give

  Any × (Format | Extend)

but there isn't. I think you are maybe trying to say that the word
"ignore" in this UAX is tantamount to such a rule. I am a native
English speaker, and would never have drawn that inference from the
text. There are a lot of passages in the Standard that sound like
gibberish to me. I know the words' meanings, but the combination don't
make any sense. I don't recall ever having this issue in other
standards I've looked at.
Unicode mailing list
Received on Thu Jan 29 2015 - 23:26:36 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 29 2015 - 23:26:37 CST