Re: Why people still want to encode precomposed letters

From: vunzndi@vfemail.net
Date: Sun Nov 23 2008 - 08:18:24 CST

  • Next message: Jukka K. Korpela: "Re: Why people still want to encode precomposed letters"

    I have to say I find myself agreeing with what Karl says here, that
    improved support for combining marks with cyrillic letters is well
    within the resources of a major vendor that would benefit a large
    number of people, and therefore a reasonable expectation.

    John Knightley

    Quoting "Karl Pentzlin" <karl-pentzlin@acssoft.de>:

    > Am Sonntag, 23. November 2008 um 05:45 schrieb Doug Ewell:
    >
    >>>> (Karl Pentzlin):
    >>>> Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
    >>>> user's needs, as long as leading operating systems behave like this
    >>>> more than 10 years after Unicode has decided no longer to accept
    >>>> precomposed characters.
    >>>>
    >>>> Microsoft et al., PLEASE do your homework! Please do it RIGHT NOW!
    >>
    > DE> I think Karl may have expected that fonts could be developed in such a
    > DE> way that combining diacritical marks would be spaced properly above the
    > DE> base character, ...
    >
    > That is exactly true, if "properly" simply means "in a way regarding the
    > formal combining classes, providing a result which can be recognized by the
    > user".
    >
    > DE> more or less by magic.
    >
    > Yes, if "magic" is colloquial for "done by a complex and well-designed
    > algorithm which possibly is not obvious for everybody at first glance" -
    > something which computer scientists (like me) do sometimes.
    >
    > DE> I used to think that would be
    > DE> possible when I knew nothing about font design, ...
    >
    > Maybe, but for myself I claim to know at least some of the basics about
    > font design. I appreciate it as a fine art where not everybody is gifted
    > to create a Gentium or Andron, but the technical basics are comprehensible.
    >
    > DE> I still think it would be reasonable to expect combining marks like
    > DE> macrons and circumflexes to be always centered over the base character,
    > DE> not off to the right, even if the vertical spacing is wrong.
    >
    > At least, this. This can be accomplished by an algorithm; a very crude
    > but working starting point is this: Enclose the base character's glyph by a
    > rectangle. Determine the center (geometrically; possible refinement:
    > barycentrally). Get the diacritic glyph from the font itself, of (if not
    > applicable) from a system default font, and enclose it by a rectangle.
    > Determine the center (geometrically). Translate the combining class of the
    > diacritic into a pair of positioning angle and distance, using a fixed table
    > made once. Place the diacritic rectangle outside of to the base
    > character, regarding the positioning angle relative to the center points,
    > and shift it inwards until the distance is accomplished. If another
    > diacritic is to
    > be added, enclose the combination generated until now by a rectangle
    > retaining the center point of the original base character, call this the
    > base character rectangle, and repeat. After finishing, take the final
    > enclosing rectangle into consideration for line positioning.
    >
    > A "real working" algorithm like this may need some 100 pages to write down,
    > but that is what the skilled developers at Microsoft et al. are paid for.
    >
    > --
    > Am Sonntag, 23. November 2008 um 04:29 schrieb Peter Constable:
    >
    > PC> How would you suggest anybody do the homework needed to discover
    > PC> that arbitrary & not-well-documented language X uses combining
    > PC> character sequence <Y, Z>?
    >
    > The latter is *explicitly* no precondition for your homework. Your task
    > is: "For European Alphabetic Scripts, implement a solution for any
    > combinations of base characters and combining characters, especially for
    > arbitrary combinations which are *not* explicitly considered in the
    > available rendering system".
    >
    > It shall be noted that, when it was decided in 1996 to encode
    > precomposed characters of European Alphabetic Scripts no longer,
    > this did not affect all diacritics.
    >
    > In fact, it has affected those diacritics which can successfully be
    > handled by an algorithm as outlined above.
    >
    > For all diacritics which need special font-specific treatment,
    > precomposed characters still are encoded after 1996, and have to be
    > encoded if new ones are encountered.
    > Such diacritics are e.g.:
    > - slash overlays (horizontal and diagonal),
    > - other overlays (e.g. middle tilde, double bar),
    > - palatal hooks and retroflex hooks,
    > - descenders.
    >
    > While there seems no official information being available, it seems to
    > be that this decision was made with care, explicitly distinguishing
    > diacritics which can be positioned automatically within reasonable
    > constraints, and such which cannot.
    >
    > This seems to be an (implicit, as now) part of the encoding model for
    > the European Alphabetic Scripts.
    > (If this assumption is correct, I propose to state this explicitly
    > in the next printed version of the Unicode Standard).
    > It differs from the Arabic model (where characters which are considered
    > as precomposed by some are encoded as single units), and it differs
    > from models used for South Asian scripts (where combining marks are
    > encoded separately even if they affect the shape of the base character's
    > glyph considerably).
    >
    > PC> Usage of combining marks with Cyrillic is nowhere near as
    > PC> widespread as it is with Latin. I think Vista does pretty well
    > PC> supporting arbitrary combining sequences for Latin in several
    > PC> fonts, as well as certain known-to-be-used sequences for Cyrillic.
    >
    > At least, there is a significant progress visible in Vista regarding
    > Latin combinations. As doing this for Cyrillic also does not imply
    > any real new mechanism, may I expect the same level of support for
    > Cyrillic in the next SP for Windows Vista?
    >
    > - Karl Pentzlin
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 08:21:19 CST