Re: Proofreading fonts

From: Asmus Freytag (
Date: Mon Jul 11 2005 - 18:21:54 CDT

  • Next message: Peter Constable: "RE: Regarding Correct Display of Extended Latin Devanagari"

    At 03:26 PM 7/11/2005, Peter Kirk wrote:
    >On 11/07/2005 18:57, Asmus Freytag wrote:
    >>>Not the most pressing issue in the world, I admit, and maybe not such a
    >>>problem for latinate scripts. This came up in the context of
    >>>proofreading an encoding of the Quran. Seems like it might be an issue
    >>>for any script with complex rendering logic.
    >>I've been waiting for you to come up with a hard case. Here's one: if
    >>there are two spellings that produce the same visual appearance, and one
    >>is right sometimes and the other is right some other times, and only a
    >>human reader can define what the correct one is by understanding the context.
    >I'm not sure about an Arabic script case, but here is one in Latin script
    >and English language, where the visual appearance in many fonts is only
    >very subtly different, a subtlety which may be entirely lost on a computer
    >screen with limited resolution:
    >The Scottish name "Iain", a fairly common variant of "Ian", spelled with a
    >capital I at the start;
    >and the English word "lain", past participle of "lay", spelled with a
    >small L at the start.

    Note that this example does not require combining marks, which was Gregg's
    starting point. However,
    it is a case where this is a *spelling* difference, and therefore
    ultimately requires human proofing.
    Suitable fonts for this already exist. If I was proofing a novel about a
    guy named Iain, I might think of searching for 'lain', as it's a word that
    might well not be part of the text.

    >And then of course there is always the case of and
    >(the latter with a capital I), which people may want to get right even
    >when not being used on the Internet. But I suppose a spelling check could
    >deal with that one.


    >In fact I think Gregg started this thread with a bad example. The two
    >encodings for a with circumflex are canonically equivalent and so
    >different encodings of the same data. The cases Gregg really needs to deal
    >with are when the alternatives are not canonically equivalent but
    >semantically distinct.

    I'm still waiting for an actual (or correctly contrived) example.


    This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 18:22:40 CDT