From: Philippe Verdy (email@example.com)
Date: Thu Jun 09 2005 - 11:08:13 CDT
From: "Dominikus Scherkl" <firstname.lastname@example.org>
>> Does the Unicode standard only deal with plain text or
>> does it also deal with text in markup languages like SGML/HTML?
> Only plain text
>> I wonder whether Arabic letters should join when they are
>> separated by markup.
> For HTML, markup shouldn't separate the letters.
> But it's somewhat complicated (e.g. for combining diacritics),
> so not all progams support that "feature".
Unicode sees markup in a HTML file as if it was splitting the rich document
into many distinct plain-text documents. What these extra markup will do is
also not specified.
So if you insert markup in the middle of a combining sequence, it is no
longer a single combining sequence for Unicode. Instead it will be seen by
Unicode as a document ending with a correct combining sequence, and another
document starting by a defective combining sequence.
The way a browser will render those defective combining sequences is not
specified in Unicode, so browsers can do what they want and can support.
The same remarks applies to ligatures, contextual forms, joining types,
mirroring or directionality, or any other contextual character properties
(there's nothing in Unicode that specifies how a plain-text document can
"inherit" some rendering or layout or semantic properties from another
distinct plain-text document, even if both plain-text documents are embedded
for example within the same HTML file).
This archive was generated by hypermail 2.1.5 : Thu Jun 09 2005 - 11:09:17 CDT