Re: markup on combining characters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Sep 08 2004 - 02:49:16 CDT

Next message: Peter Kirk: "Re: markup on combining characters"

Previous message: Jony Rosenne: "RE: markup on combining characters"
In reply to: Jony Rosenne: "RE: markup on combining characters"
Next in thread: Asmus Freytag: "Re: markup on combining characters"
Reply: Asmus Freytag: "Re: markup on combining characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Jony Rosenne" <rosennej@qsm.co.il>
> Peter Kirk
>> You mean, you would represent a black e with a red acute accent as
>> something like "e", ZWJ, "<red>", IBC, acute, "</red>"? That
>> looks like
>> a nightmare for all kinds of processing and a nightmare for rendering.
>
> No, it is more like <forecolor:black, combiningcolor:red> "e" "acute"
> And there is no Unicode decision against it.

And still no decision if this invisible base character will be added or not.
It's just a public review for now, to address the first issue of rendering
isolated non-spacing combining marks that currently don't have a spacing
variant (I think it's a good idea as it would avoid adding most of the
missing ones, notably for the non-generic L/G/C combining marks).

Note that your suggestion of:
   <forecolor:black, combiningcolor:red> "e" "acute"
should also work with any normalized form of the same text, i.e. with:
   <forecolor:black, combiningcolor:red> "e with acute"
where the combining mark is composed. The issue here is that this becomes
tricky for renderers that will need to redecompose strings in normalized
forms, before applying style.
Basically I prefer the Peter solution with:
   "e", ZWJ?, "<red>", IBC, acute, "</red>"
which is more independant of the normalization form. Then the question is
whever the text within <red>...</red> markup should combine visually when
rendered.

For now I see the proposed IBC (no name for it for now) only as a way to
transform non-spacing combining marks in spacing non-combining variants,
when they dont exist separately in Unicode (so this would not be recommanded
for the non-spacing acute accent which already has a spacing version that
does not require using a leading IBC.)
Technically, if an IBC character is added, a renderer will not necessarily
render <IBC, non-spacing combining acute> the same way as <spacing
non-combining acute accent>, even if it should better do so.
In this past sentence, the "should" means that the existing spacing
non-combining marks are left as the standard legacy way to encode them, and
they normally don't combine when rendered after a base letter, even if
there's markup around them (except if this markup explicitly says that they
should combine):

If I take the above example,
    "e", ZWJ?, "<red>", IBC, acute, "</red>"
the same rich-text should also be renderable without the markup in
plain-text as if it was:
    "e", ZWJ?, IBC, acute
i.e. (with the "should" above) like if it was also:
    "e", ZWJ?, spacing acute
I have placed the "?" symbol after ZWJ to exhibit the fact that something
would be necessary to allow this last text to remove the non-combining
non-spacing behavior of the spacing acute character. Without it, the text:
    "e", spacing acute
or equivalently (with the should above):
    "e", IBC, combining acute
would not be allowed to render a combined e with an accute, and two separate
glyphs would be rendered, and two separate character entities interpreted
(as they are today in legacy plain-texts).

So the question remains about how to add markup on combining marks: the
proposed IBC alone cannot solve such problems, unless there's an agreement
that ZWJ immediately followed by IBC should be rendered as if they were not
present (but in that case, a spacing acute becomes semantically and
graphically distinct from <IBC, combining acute>: this is what will happen
in any case with normalization forms due to the Unicode stability policy, as
existing spacing marks must remain undecomposable in NFD or NFKD forms).

I also note that IBC is intended to replace the need to use a standard SPACE
as the base character for building a spacing variant of combining marks when
there's no standard spacing variant encoded in Unicode (this is a legacy
hack, which causes various problems because of whitespace normalization in
many plain-text formats or applications, or in XML and HTML, and the special
word-breaking behavior of spaces). I don't see it as a way to deprecate the
existing block of spacing marks.

Next message: Peter Kirk: "Re: markup on combining characters"
Previous message: Jony Rosenne: "RE: markup on combining characters"
In reply to: Jony Rosenne: "RE: markup on combining characters"
Next in thread: Asmus Freytag: "Re: markup on combining characters"
Reply: Asmus Freytag: "Re: markup on combining characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Sep 08 2004 - 02:51:32 CDT