From: Asmus Freytag (firstname.lastname@example.org)
Date: Wed Jul 25 2007 - 15:58:41 CDT
With all changes to UAX#14, it is important to make sure that existing
implementations can continue to be conformant as much as possible, that
constraints on character behavior are retained for formatting characters
(so that people cannot redesign a ZWSP or WJ by treating it in novel and
incompatibel ways and claiming conformance), while making sure that
there are no unreasonable (and artificial) restraints on higher level
protocols. I believe in large measure, Andy has been able to do that in
the latest update.
As for the putative issue you raise here, Ken has already pointed out
why there may not in fact be a problem where you see one. Your furhter
statements show that you haven't fully understood some of the core
concepts of the way the default line breaking algorithm is intended to work.
On 7/25/2007 5:17 AM, Philippe Verdy wrote:
> As an alternative to my proposal, parentheses or quotation marks could also
> be described by making them inherit the line breaking opportunity property
> from the character they immediately surround, while keeping the prohibition
> of linebreking between the parenthese/quotation mark and the inner character
> it touches.
For quotation marks, it's not possible to determine what the inner
character is on the *character* level. See the fascinating discussion of
this topic in the relevant chapter of the standard.
> This would correctly handle the case of parentheses used to surround
> ideographs, because parentheses should not be detached from the inner
> ideograph, despite they should still remain breakable from the outer
> character, as if these characters were absent from the text (so the line
> break rule would treat parentheses and quotation marks as if they were
> diacritics and part of a larger unbreakable grapheme cluster with the inner
> character used as the effective base, and line breaking would be analyzed by
> first ignoring them but just checking the breaking opportunities between the
> inner and outer character.)
The default way to handle parens in ideographic context need to match
the common or legacy practice; that's at least the design point: to
allow the default to reflect (reasonable) existing or legacy practice
for a given script out of the box. A complicated re-analysis might be a
basis for your own customization for a very sophisticated layout engine,
but is misplaced here.
Also, the class CM inherits from the *preceding* character. Your model
would result in inheritance in the other direction, which would
invalidate all existing implementations (not even those that import the
UCD tables could update to such a scheme w/o changes in architecture).
This archive was generated by hypermail 2.1.5 : Wed Jul 25 2007 - 16:01:01 CDT