From: Peter Constable (petercon@microsoft.com)
Date: Thu Jun 09 2005 - 12:35:45 CDT
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
On Behalf
> Of Philippe Verdy
> Unicode sees markup in a HTML file as if it was splitting the rich
document
> into many distinct plain-text documents. What these extra markup will
do is
> also not specified.
>
> So if you insert markup in the middle of a combining sequence, it is
no
> longer a single combining sequence for Unicode. Instead it will be
seen by
> Unicode as a document ending with a correct combining sequence, and
another
> document starting by a defective combining sequence.
AFAIK, this personal opinion of Philippe's is not reflected anywhere in
the Unicode Standard. The most likely place for it to be addressed would
be UTR20, and it is silent on this matter.
*My* opinion, supported by the silence of the Unicode Standard on the
topic, is that it is up to the higher-level protocol -- the HTML spec --
to specify what the impact of various markup elements may have on
various text processes over the character content of a document. For
instance, I would expect the sequences in <TD>abc</TD><TD>def</TD> to be
treated as distinct document elements, implying no cursive connection
between them (among other things), but I would expect the sequences
<span>abc</span><span>def</span> to be considered a single text element
for rendering purposes (barring further stylesheet effects -- a
stylesheet might, of course, transform spans into distinct non-inline
structural elements).
Peter Constable
This archive was generated by hypermail 2.1.5 : Thu Jun 09 2005 - 12:36:59 CDT