RE: Cost of no OCR for extended Latin

From: Don Osborn ([email protected])
Date: Sat Oct 27 2007 - 10:42:00 CDT

Next message: Philippe Verdy: "thorn vs. y or th, eth and other similar letters/signs (was: Level of Unicode support required for various languages)"

Previous message: [email protected]: "Re: Level of Unicode support required for various languages"
In reply to: [email protected]: "Re: Cost of no OCR for extended Latin"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thanks Lorna and David for this information. I was not familiar with ABBYY
FineReader. I have been impressed with use of OmniPage OCR for some kinds of
text (in English mainly; occasionally French) that I expected problems with,
due to the quality of the originals and the fact I had to scan photocopies.
Getting at least as good or better performance in FineReader with at least
some extended Latin would be impressive.

One hopes to see an expansion of the extended Latin character repertoire to
cover languages with multiple diacritics (noting the absence of, among
others, Yoruba and Igbo in Africa, and Vietnamese in Asia).

Part of the reason for the question is a discussion about ways to promote or
develop a project on digitization of African language materials. Ideally one
should OCR right the first time, and if the technology permits that for
extended Latin orthographies, that's one less problem to overcome.

All the best.

Don

From: [email protected] [mailto:[email protected]]
Sent: Thursday, October 25, 2007 11:09 AM
To: Don Osborn; [email protected]
Subject: Re: Cost of no OCR for extended Latin

> David Starner wrote on 10/25/2007 05:41:19 AM:

> > On 10/25/07, Don Osborn <[email protected]> wrote:
> > Is anyone aware of an OCR system that recognizes extended Latin
characters
> > from say Extended A&B, IPA, and Extended Additional ranges? That is for
any
> > language (orthography) including these characters?
>
> ABBYY offers most of Extended A and some of Extended B and Additional.
> The list of supported languages is
> <http://www.abbyy.com/finereader8/?param=44927>, which should map to
> the list of supported characters. It would be hard to impossible to
> create and test an OCR without a substantial corpus of material using
> a character; I suspect many languages are on ABBYY's list only because
> the orthography is a subset of those supported for other reasons.

Quoting two different colleagues of mine: "I recommend FineReader
(www.finereader.com) from Abbyy Software. While OmniPage is good, FineReader
is better--the best OCR software at an affordable price...FineReader can
handle special characters better than other OCR programs."

and

"I heartily recommend FineReader. It can be "trained" to recognize
speciality characters, and it is surprisingly accurate - about 99% - which
means that 1% of the document will require manual corrections."

Lorna

Next message: Philippe Verdy: "thorn vs. y or th, eth and other similar letters/signs (was: Level of Unicode support required for various languages)"
Previous message: [email protected]: "Re: Level of Unicode support required for various languages"
In reply to: [email protected]: "Re: Cost of no OCR for extended Latin"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Oct 27 2007 - 10:44:15 CDT