Re: Why people still want to encode precomposed letters

From: Asmus Freytag (
Date: Sun Nov 23 2008 - 13:38:30 CST

  • Next message: Hans Aberg: "Re: Why people still want to encode precomposed letters"

    On 11/23/2008 6:39 AM, wrote:
    > Quoting "Don Osborn" <>:
    >> A couple of quick questions. First, about how long would the list of
    >> cominations be?
    > Very long, though a worth while list, over a thousand, so let's say
    > thousands of combinations.
    In the early development period for 10646, Germany presented a list of
    2,000 combinations, then thought to be sufficient to handle the needs of
    scholars for Indo-European languages.

    A raw list is not sufficient. You need to annotate the list with usage
    frequency and application domain to be useful for vendors targeting
    specific markets. You would also need to provide a generic description
    of the desired layout and observed or allowed variability. For that you
    need a generic set of layout rules for diacritics so that you have a
    "language" which allows you to describe this issue independent of a
    particular font or font style. Finally, such a list must be reviewable.
    Ideally, it would contain links to samples of the combinations observed,
    so that if there's a claim for correction of any of the information
    that's collected, it's possible to go back and ascertain why the
    existing entry is as described. Without that, the collection could
    "drift". In other words, you need to be able to decide whether what you
    have in your list is wrong, or just a special case or even regular
    variant of what the person claiming the correction has found.

    Incidentally, that is a problem not fully addressed by Unicode. Some of
    the more obscure characters have been and will be in danger of being
    incorrectly duplicated by new additions or incorrectly conflated with
    usages that should deserve new characters, all because in too many
    cases, all that's accessible to decide issues like that is the listing
    in the standard.

    All such lists, whether of characters or combinations, start with the
    well-known, and progress from there. Well-known entities don't have the
    issue that I mentioned. Once you proceed to the lesser known entities,
    or even to rather obscure ones, the need to be able to backtrack to the
    original evidence becomes stronger.

    Especially for self-organizing. distributed online collaborations, the
    evidence would have to become part of the entity's record, not part of
    some technical committee archive.

    >> Second, if the number is significant, might it make sense to approach
    >> this
    >> as a "Web 2.0" task, using perhaps a wiki? Under such an approach, very
    >> short articles could be designed to give minimal documentation and
    >> references, as well as relevant technical information. There would of
    >> course
    >> be some details to resolve about who can contribute, how
    >> contributions are
    >> vetted, etc., but the biggest issue would probably be the resources to
    >> set-up and maintain such a resource.
    > IMHO it would be hard for such a set up to carry much weight. There is
    > also the question of how one deals with combinations which in several
    > languages and the required rendering is different.
    The hard problem would be to design it to carry not just the list, but
    the necessary meta-information. Should such design be successful, there
    would be no question that it would carry weight.


    This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 13:40:17 CST