Re: The Geejay

From: Asmus Freytag (
Date: Fri Jan 04 2008 - 12:16:30 CST

  • Next message: arno: "Re: chairless hamza (in reply to Khaled)"

    On 1/4/2008 4:09 AM, David Starner wrote:
    > On Jan 4, 2008 5:28 AM, Jeroen Ruigrok van der Werven
    > <> wrote:
    >> And what if you want to make available such historical documents using an
    >> electronic medium? The only option you have would be a scanned image (with all
    >> its pros and cons) or an incomplete text due to certain glyphs being replaced
    >> with non-equivalent ones.
    > Unicode doesn't deal in glyphs; it deals in characters.
    Correct, but irrelevant. The question whether this is a variant glyph of
    something already available or a new character is the one that still
    needs to be decided.
    > As someone
    > making such documents available, in cases like these, I have no
    > problem replacing the character with an equivalent one, of which there
    > are several choices.
    In principle there are three choices (but not all of them may be
    available), and not all of them are equally desirable.

    1) If the character is unquestionably already encoded, pick a different,
    but available glyph variant for it
    2) Pick a different character, but one that is used for the same purpose
    elsewhere (e.g. an IPA or UPA character).
    3) Pick something that is unrelated but looks close.

    Case three is an example of the odious "arm's length" unification that
    gave us all the ambiguous ASCII characters.
    > If you want to see the original typography, I've
    > got the scans. If it's a set of characters, that's more complex, and
    > if there's a bunch of documents printed using the characters, then it
    > would be useful to have the original characters encoded, but for just
    > one character in just one document, no. If you're picky about the
    > original character, use a private use code point.
    > You're asking to have a code point published, and fonts created that
    > covers your character, for an extremely limited use. That's expensive
    > and way out of proportion to the value of the character.
    The full set of choices are

    4) use a private use code point with special font (fine solution for
    self-contained forms of publication)
    5) use an arbitrary code point with special font (not quite the done
    thing, but unfortunately very common)
    6) do nothing (not using a character code)

    In the entire field of character coding there is this issue of having a
    few very frequent characters and a long, very long tail of rarely used
    ones; thousands in Unicode are probably limited to a single dictionary
    (mostly Han characters). So the issue of the diminishing return, while
    real, is hardly unique to this case. Just because most people on this
    list are not familiar with the details of the Han character encoding,
    doesn't make that any less a precedent.

    None of these general remarks should be taken as a call to encode just
    this particular character. I have no personal stake in that question
    (nor do I see any actual proposals being submitted). I do think,
    however, that in deciding that question it needs to be taken into
    account that Unicode apparently has taken onboard the idea of covering
    not just a single mainstream phonetical notation, but several, including
    some additions to cover the practice in popular (not specialist)
    dictionaris. See 1D7A for example.

    Once and overall decision has been made, e.g. to cover mathematical or
    phonetic notations, there is little benefit in reopening that question
    with every minor character proposal under one of these repertoires. It
    does, however, use up lots of committee time that could be spent on
    better things.


    This archive was generated by hypermail 2.1.5 : Fri Jan 04 2008 - 12:18:55 CST