RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Dec 07 2003 - 20:05:57 EST

Next message: jcowan@reutershealth.com: "Re: Transcoding Tamil in the presence of markup"

Previous message: Michael Everson: "Re: Glottal stops (bis) (was RE: Missing African Latin letters (bis))"
In reply to: Peter Kirk: "Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Next in thread: Doug Ewell: "Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk wrote:
> On 07/12/2003 15:40, Philippe Verdy wrote:
> > Peter Kirk wrote:
> > > Of course there is an even simpler way to provide the glue I
> > > was talking about. W3C simply needs to relax the rule forbidding
> > > combining marks at the start of a string (and interpret the one
> > > precomposed character with ">" as base as if it were decomposed,
> > > as I suggested before), and, remembering that use of NFC is a
> > > strong recommendation rather than a requirement, not insist on
> > > NFC in such cases. Then nothing needs to be added to Unicode.
> >
> > There's little chance that this will be relaxed by the W3C, because
> > now HTML is XML (since XHTML is the current recommanded standard,
> > and HTML 4.01 is just kept as is, and all other extensions are being
> > developped since XHTML 1.1 as modules with DTDs or XML schemas), and
> > because XML text elements are independant. What you propose would
> > break the XML containment model (could it be implemented however in
> > XSLT transforms from XHTML? I doubt because the output of XSLT is
> > also XML, even if it does not always produce a XML syntax, but only
> > a DOM-parsable tree or InfoSet...)
>
> Well, this is W3C's problem. They seem to have backed themselves into a
> corner which they need to get out of but have no easy way of doing so.
> Unicode is of course very familiar with this kind of situation e.g. with
> character name errors, combining class errors, 11000+ redundant Korean
> characters without decompositions, etc etc. So no doubt it can extend
> its sympathy; and possibly even offer to help by encoding the kind of
> character I was suggesting early (perhaps in exchange for some W3C
> readiness to accept correction of errors in the normalisation data?).
> But really this is not a Unicode issue.

I don't agree with you there: going to XML was a good decision for the
evolution,stabilisation and interoperability of HTML (now extensions are
in modules, described by DTDs or schemas, and this offers a good framework
for interoperability of documents, even if they don't implement the same
set of optional modules.

If you want something better, it is not by modifying XML (so HTML will
stick on XML now). But in the way the DOM-tree or InfoSet generated from
a parsed XHTML document will be rendered. With CSS and XSLT, you have
the tools to define precisely with a compilable language, how this data
tree can be transformed to prepare the rendering of documents.

Nothing will forbid the standard XHTML modules to define standard
transformations in relation with style, as a XSLT application. So this
applies to the transformation of plain text contained in the XHTML
document into another XML document containing all the associated glyphic,
layout and style information. Some of these information may be used to
monitor the behavior of font renderers to enable or disable features
with the augmented data which contains now< more than just plain text.

So this stylesheet processor will be able to position clealy diacritics
above letters, or to create Korean syllabic clusters, or even Han
ideographic clusters, or to alter the relative positions of the diacritic
and its base letter to take into account differences of styles (for
example, if the stylesheet instructs the HTML processor to render dots
above "i" with a custom start bitmap or SVG graphic, or in bold style
from another font...)

The initial problem for Tamil transcoding with markup is not a problem
for Unicode or even for HTML: the author has created in its document
separate runs of texts without specifying clearly how these separate runs
may be rendered in a coherent layout. For Unicode or for HTML, there's
a default layout which is the HTML "box model", and attempts to break it
requires relative positioning (specified in CSS), and possible
transformation of the initial text into other text or markup (this is a
work for XSLT, and could be specified in a further revision of CSS, to
specify such complex rendering out of the default "box model").

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: jcowan@reutershealth.com: "Re: Transcoding Tamil in the presence of markup"
Previous message: Michael Everson: "Re: Glottal stops (bis) (was RE: Missing African Latin letters (bis))"
In reply to: Peter Kirk: "Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Next in thread: Doug Ewell: "Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Dec 07 2003 - 20:52:10 EST