Re: Why people still want to encode precomposed letters

From: Karl Pentzlin (karl-pentzlin@acssoft.de)
Date: Sun Nov 23 2008 - 05:00:23 CST

Next message: philip chastney: "RE: Why people still want to encode precomposed letters"

Previous message: Hans Aberg: "Re: Why people still want to encode precomposed letters"
In reply to: Doug Ewell: "Re: Why people still want to encode precomposed letters"
Next in thread: vunzndi@vfemail.net: "Re: Why people still want to encode precomposed letters"
Reply: vunzndi@vfemail.net: "Re: Why people still want to encode precomposed letters"
Reply: Doug Ewell: "Re: Why people still want to encode precomposed letters"
Reply: John Hudson: "Re: Why people still want to encode precomposed letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Am Sonntag, 23. November 2008 um 05:45 schrieb Doug Ewell:

>>> (Karl Pentzlin):
>>> Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
>>> user's needs, as long as leading operating systems behave like this
>>> more than 10 years after Unicode has decided no longer to accept
>>> precomposed characters.
>>>
>>> Microsoft et al., PLEASE do your homework! Please do it RIGHT NOW!
>
DE> I think Karl may have expected that fonts could be developed in such a
DE> way that combining diacritical marks would be spaced properly above the
DE> base character, ...

That is exactly true, if "properly" simply means "in a way regarding the
formal combining classes, providing a result which can be recognized by the
user".

DE> more or less by magic.

Yes, if "magic" is colloquial for "done by a complex and well-designed
algorithm which possibly is not obvious for everybody at first glance" -
something which computer scientists (like me) do sometimes.

DE> I used to think that would be
DE> possible when I knew nothing about font design, ...

Maybe, but for myself I claim to know at least some of the basics about
font design. I appreciate it as a fine art where not everybody is gifted
to create a Gentium or Andron, but the technical basics are comprehensible.

DE> I still think it would be reasonable to expect combining marks like
DE> macrons and circumflexes to be always centered over the base character,
DE> not off to the right, even if the vertical spacing is wrong.

At least, this. This can be accomplished by an algorithm; a very crude
but working starting point is this: Enclose the base character's glyph by a
rectangle. Determine the center (geometrically; possible refinement:
barycentrally). Get the diacritic glyph from the font itself, of (if not
applicable) from a system default font, and enclose it by a rectangle.
Determine the center (geometrically). Translate the combining class of the
diacritic into a pair of positioning angle and distance, using a fixed table
made once. Place the diacritic rectangle outside of to the base
character, regarding the positioning angle relative to the center points,
and shift it inwards until the distance is accomplished. If another diacritic is to
be added, enclose the combination generated until now by a rectangle
retaining the center point of the original base character, call this the
base character rectangle, and repeat. After finishing, take the final
enclosing rectangle into consideration for line positioning.

A "real working" algorithm like this may need some 100 pages to write down,
but that is what the skilled developers at Microsoft et al. are paid for.

--
Am Sonntag, 23. November 2008 um 04:29 schrieb Peter Constable:
PC> How would you suggest anybody do the homework needed to discover
PC> that arbitrary & not-well-documented language X uses combining
PC> character sequence <Y, Z>?
The latter is *explicitly* no precondition for your homework. Your task
is: "For European Alphabetic Scripts, implement a solution for any
combinations of base characters and combining characters, especially for
arbitrary combinations which are *not* explicitly considered in the
available rendering system".
It shall be noted that, when it was decided in 1996 to encode
precomposed characters of European Alphabetic Scripts no longer,
this did not affect all diacritics.
In fact, it has affected those diacritics which can successfully be
handled by an algorithm as outlined above.
For all diacritics which need special font-specific treatment,
precomposed characters still are encoded after 1996, and have to be
encoded if new ones are encountered.
Such diacritics are e.g.:
- slash overlays (horizontal and diagonal),
- other overlays (e.g. middle tilde, double bar),
- palatal hooks and retroflex hooks,
- descenders.
While there seems no official information being available, it seems to
be that this decision was made with care, explicitly distinguishing
diacritics which can be positioned automatically within reasonable
constraints, and such which cannot.
This seems to be an (implicit, as now) part of the encoding model for
the European Alphabetic Scripts.
(If this assumption is correct, I propose to state this explicitly
in the next printed version of the Unicode Standard).
It differs from the Arabic model (where characters which are considered
as precomposed by some are encoded as single units), and it differs
from models used for South Asian scripts (where combining marks are
encoded separately even if they affect the shape of the base character's
glyph considerably).
PC> Usage of combining marks with Cyrillic is nowhere near as
PC> widespread as it is with Latin. I think Vista does pretty well
PC> supporting arbitrary combining sequences for Latin in several
PC> fonts, as well as certain known-to-be-used sequences for Cyrillic.
At least, there is a significant progress visible in Vista regarding
Latin combinations. As doing this for Cyrillic also does not imply
any real new mechanism, may I expect the same level of support for
Cyrillic in the next SP for Windows Vista?
- Karl Pentzlin

Next message: philip chastney: "RE: Why people still want to encode precomposed letters"
Previous message: Hans Aberg: "Re: Why people still want to encode precomposed letters"
In reply to: Doug Ewell: "Re: Why people still want to encode precomposed letters"
Next in thread: vunzndi@vfemail.net: "Re: Why people still want to encode precomposed letters"
Reply: vunzndi@vfemail.net: "Re: Why people still want to encode precomposed letters"
Reply: Doug Ewell: "Re: Why people still want to encode precomposed letters"
Reply: John Hudson: "Re: Why people still want to encode precomposed letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 05:03:41 CST