From: vunzndi@vfemail.net
Date: Sun Nov 23 2008 - 08:18:24 CST
I have to say I find myself agreeing with what Karl says here, that
improved support for combining marks with cyrillic letters is well
within the resources of a major vendor that would benefit a large
number of people, and therefore a reasonable expectation.
John Knightley
Quoting "Karl Pentzlin" <karl-pentzlin@acssoft.de>:
> Am Sonntag, 23. November 2008 um 05:45 schrieb Doug Ewell:
>
>>>> (Karl Pentzlin):
>>>> Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
>>>> user's needs, as long as leading operating systems behave like this
>>>> more than 10 years after Unicode has decided no longer to accept
>>>> precomposed characters.
>>>>
>>>> Microsoft et al., PLEASE do your homework! Please do it RIGHT NOW!
>>
> DE> I think Karl may have expected that fonts could be developed in such a
> DE> way that combining diacritical marks would be spaced properly above the
> DE> base character, ...
>
> That is exactly true, if "properly" simply means "in a way regarding the
> formal combining classes, providing a result which can be recognized by the
> user".
>
> DE> more or less by magic.
>
> Yes, if "magic" is colloquial for "done by a complex and well-designed
> algorithm which possibly is not obvious for everybody at first glance" -
> something which computer scientists (like me) do sometimes.
>
> DE> I used to think that would be
> DE> possible when I knew nothing about font design, ...
>
> Maybe, but for myself I claim to know at least some of the basics about
> font design. I appreciate it as a fine art where not everybody is gifted
> to create a Gentium or Andron, but the technical basics are comprehensible.
>
> DE> I still think it would be reasonable to expect combining marks like
> DE> macrons and circumflexes to be always centered over the base character,
> DE> not off to the right, even if the vertical spacing is wrong.
>
> At least, this. This can be accomplished by an algorithm; a very crude
> but working starting point is this: Enclose the base character's glyph by a
> rectangle. Determine the center (geometrically; possible refinement:
> barycentrally). Get the diacritic glyph from the font itself, of (if not
> applicable) from a system default font, and enclose it by a rectangle.
> Determine the center (geometrically). Translate the combining class of the
> diacritic into a pair of positioning angle and distance, using a fixed table
> made once. Place the diacritic rectangle outside of to the base
> character, regarding the positioning angle relative to the center points,
> and shift it inwards until the distance is accomplished. If another
> diacritic is to
> be added, enclose the combination generated until now by a rectangle
> retaining the center point of the original base character, call this the
> base character rectangle, and repeat. After finishing, take the final
> enclosing rectangle into consideration for line positioning.
>
> A "real working" algorithm like this may need some 100 pages to write down,
> but that is what the skilled developers at Microsoft et al. are paid for.
>
> --
> Am Sonntag, 23. November 2008 um 04:29 schrieb Peter Constable:
>
> PC> How would you suggest anybody do the homework needed to discover
> PC> that arbitrary & not-well-documented language X uses combining
> PC> character sequence <Y, Z>?
>
> The latter is *explicitly* no precondition for your homework. Your task
> is: "For European Alphabetic Scripts, implement a solution for any
> combinations of base characters and combining characters, especially for
> arbitrary combinations which are *not* explicitly considered in the
> available rendering system".
>
> It shall be noted that, when it was decided in 1996 to encode
> precomposed characters of European Alphabetic Scripts no longer,
> this did not affect all diacritics.
>
> In fact, it has affected those diacritics which can successfully be
> handled by an algorithm as outlined above.
>
> For all diacritics which need special font-specific treatment,
> precomposed characters still are encoded after 1996, and have to be
> encoded if new ones are encountered.
> Such diacritics are e.g.:
> - slash overlays (horizontal and diagonal),
> - other overlays (e.g. middle tilde, double bar),
> - palatal hooks and retroflex hooks,
> - descenders.
>
> While there seems no official information being available, it seems to
> be that this decision was made with care, explicitly distinguishing
> diacritics which can be positioned automatically within reasonable
> constraints, and such which cannot.
>
> This seems to be an (implicit, as now) part of the encoding model for
> the European Alphabetic Scripts.
> (If this assumption is correct, I propose to state this explicitly
> in the next printed version of the Unicode Standard).
> It differs from the Arabic model (where characters which are considered
> as precomposed by some are encoded as single units), and it differs
> from models used for South Asian scripts (where combining marks are
> encoded separately even if they affect the shape of the base character's
> glyph considerably).
>
> PC> Usage of combining marks with Cyrillic is nowhere near as
> PC> widespread as it is with Latin. I think Vista does pretty well
> PC> supporting arbitrary combining sequences for Latin in several
> PC> fonts, as well as certain known-to-be-used sequences for Cyrillic.
>
> At least, there is a significant progress visible in Vista regarding
> Latin combinations. As doing this for Cyrillic also does not imply
> any real new mechanism, may I expect the same level of support for
> Cyrillic in the next SP for Windows Vista?
>
> - Karl Pentzlin
>
>
>
This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 08:21:19 CST