Unicode enabled OCR software

From: Kent_Spielmann@sil.org
Date: Tue Jan 31 2006 - 12:43:56 CST

Next message: Peter Constable: "RE: two teaspoons of computational Hebrew history"

Previous message: Jukka K. Korpela: "Re: Unicode, colours and (hiero)glyphs"
Next in thread: Mark E. Shoulson: "Re: Unicode enabled OCR software"
Reply: Mark E. Shoulson: "Re: Unicode enabled OCR software"
Reply: David Starner: "Re: Unicode enabled OCR software"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Does anyone know of OCR software solution that permits mapping to the full
Unicode character set as output from the character recognition process?
This needs to include mapping to base character+combining character
combinations.

All of the software we have looked (Fine Reader, OmniPage, and Text Bridge)
at can map to only the Unicode characters also defined in a subset of the
ANSI code pages.

We are trying to convert documents in minority languages and as well as
linguistic documentation, and have need for access to a larger set of
lesser-used characters.

We find the situation curious since the reader that we are using (Abbyy
Fine Reader) does output Unicode. It simply limits the selection of output
codepoints to characters previously defined in ANSI. Allowing users to
create custom mappings to "non-ANSI" Unicode codepoints would not seem to
be difficult.

We speculate the reason may be one or more of the following:
   The OCR developers may feel that, if they allow output to other code
   points, they also need to provide recognition templates for them.
   The OCR recognition software relies on spell checkers to improve output
   accuracy and apparently most spell check dictionaries do not allow
   non-ANSI characters (this is true for the Office 2003 spell checker).
   There is not enough commercial motivation for providing this capability.

Kent Spielmann

International Linguistics Department
7500 W. Camp Wisdom Road,
Dallas, TX 75236 USA
Tel: + 1 972 708 7570

Next message: Peter Constable: "RE: two teaspoons of computational Hebrew history"
Previous message: Jukka K. Korpela: "Re: Unicode, colours and (hiero)glyphs"
Next in thread: Mark E. Shoulson: "Re: Unicode enabled OCR software"
Reply: Mark E. Shoulson: "Re: Unicode enabled OCR software"
Reply: David Starner: "Re: Unicode enabled OCR software"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 31 2006 - 12:52:04 CST