Re: Arabic letters separated by markup

From: Philippe Verdy (
Date: Thu Jun 09 2005 - 11:08:13 CDT

  • Next message: Mete Kural: "Re: Arabic letters separated by markup"

    From: "Dominikus Scherkl" <>
    >> Does the Unicode standard only deal with plain text or
    >> does it also deal with text in markup languages like SGML/HTML?
    > Only plain text
    >> I wonder whether Arabic letters should join when they are
    >> separated by markup.
    > For HTML, markup shouldn't separate the letters.
    > But it's somewhat complicated (e.g. for combining diacritics),
    > so not all progams support that "feature".

    Unicode sees markup in a HTML file as if it was splitting the rich document
    into many distinct plain-text documents. What these extra markup will do is
    also not specified.

    So if you insert markup in the middle of a combining sequence, it is no
    longer a single combining sequence for Unicode. Instead it will be seen by
    Unicode as a document ending with a correct combining sequence, and another
    document starting by a defective combining sequence.

    The way a browser will render those defective combining sequences is not
    specified in Unicode, so browsers can do what they want and can support.

    The same remarks applies to ligatures, contextual forms, joining types,
    mirroring or directionality, or any other contextual character properties
    (there's nothing in Unicode that specifies how a plain-text document can
    "inherit" some rendering or layout or semantic properties from another
    distinct plain-text document, even if both plain-text documents are embedded
    for example within the same HTML file).

    This archive was generated by hypermail 2.1.5 : Thu Jun 09 2005 - 11:09:17 CDT