From: Asmus Freytag (firstname.lastname@example.org)
Date: Mon Jul 11 2005 - 12:57:42 CDT
At 04:54 AM 7/11/2005, Gregg Reynolds wrote:
>Asmus Freytag wrote:
>>I think asking proofreaders to proof the underlying encoding is
>>backwards. If the task is to ensure a preferred encoding, the best
>>approach is to use software, whether a Perl script for plain text files
>>with markup or a macro inside an editor that produces a proprietary
>>After all, differences in *encoding* is something that software is easily
>>made aware of, where as differences in *spelling* still require human proofing.
>True enough, but then you still have to trust the software and answer the
>question "how do I know that I know?".
Write two Perl scripts. With software-based test, it's cheap to re-run the
test if you are in doubt that something changed in a document.
> Most people in the world are not capable of writing a perl script.
Fewer people are able to design a font with special handling for combining
>And suppose you have a document being passed around and proofread by
>people using a variety of software. How can you be sure the encoding
>doesn't get munged somewhere along the line? The simplest and most
>reliable way (IMO) is to have a transparent proofreader's font of some kind.
To address that problem, you need to run your software test at the end.
In your scenario of manual proofing the proofing would need to be done at
the end as well if the encoding is really unstable.
>Also, aside from composition/decomposition, there's the question of
>whether all the code elements are chosen from the proper script block.
I think the practical scenarios for that, given ordinary text, can be
captured in a reasonably straightforward set of rules that can be expressed
as regular expressions.
Ordinarily, you'd expect a spell checker to weigh in, if these are letters.
For punctuation you may need to provide your own script.
>Not the most pressing issue in the world, I admit, and maybe not such a
>problem for latinate scripts. This came up in the context of proofreading
>an encoding of the Quran. Seems like it might be an issue for any script
>with complex rendering logic.
I've been waiting for you to come up with a hard case. Here's one: if there
are two spellings that produce the same visual appearance, and one is right
sometimes and the other is right some other times, and only a human reader
can define what the correct one is by understanding the context.
That's the kind of situation where the task really is a spell-check, not an
encoding check, and I explicitly excluded that case from my recommendation.
This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 12:59:33 CDT