From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jul 11 2005 - 18:21:54 CDT
At 03:26 PM 7/11/2005, Peter Kirk wrote:
>On 11/07/2005 18:57, Asmus Freytag wrote:
>
>>...
>>
>>>Not the most pressing issue in the world, I admit, and maybe not such a
>>>problem for latinate scripts. This came up in the context of
>>>proofreading an encoding of the Quran. Seems like it might be an issue
>>>for any script with complex rendering logic.
>>
>>
>>I've been waiting for you to come up with a hard case. Here's one: if
>>there are two spellings that produce the same visual appearance, and one
>>is right sometimes and the other is right some other times, and only a
>>human reader can define what the correct one is by understanding the context.
>I'm not sure about an Arabic script case, but here is one in Latin script
>and English language, where the visual appearance in many fonts is only
>very subtly different, a subtlety which may be entirely lost on a computer
>screen with limited resolution:
>
>The Scottish name "Iain", a fairly common variant of "Ian", spelled with a
>capital I at the start;
>
>and the English word "lain", past participle of "lay", spelled with a
>small L at the start.
Note that this example does not require combining marks, which was Gregg's
starting point. However,
it is a case where this is a *spelling* difference, and therefore
ultimately requires human proofing.
Suitable fonts for this already exist. If I was proofing a novel about a
guy named Iain, I might think of searching for 'lain', as it's a word that
might well not be part of the text.
>And then of course there is always the case of paypal.com and paypaI.com
>(the latter with a capital I), which people may want to get right even
>when not being used on the Internet. But I suppose a spelling check could
>deal with that one.
Correct.
>In fact I think Gregg started this thread with a bad example. The two
>encodings for a with circumflex are canonically equivalent and so
>different encodings of the same data. The cases Gregg really needs to deal
>with are when the alternatives are not canonically equivalent but
>semantically distinct.
I'm still waiting for an actual (or correctly contrived) example.
A./
This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 18:22:40 CDT