From: Neil Harris (firstname.lastname@example.org)
Date: Fri Feb 18 2011 - 07:20:26 CST
On 18/02/11 11:35, Andrew West wrote:
> On 18 February 2011 08:23, Chris Weber<email@example.com> wrote:
>> I would normally use Babelmap instead of browsing the collation maps, but those are helpful, thank you Peter. That's right Jukka, when I saw Detexify at http://detexify.kirelabs.org/classify.html I thought how useful something like it could be for visually finding Unicode characters, identifying confusables, and maybe other uses.
> What you want is a pan-Unicode OCR / handwriting recognition tool,
> which would be the most awesome thing ever if it worked reasonably
> well. It is the sort of thing that you should put in as a feature
> request for BabelMap. It can't be that hard to add a simple drawing
> pad, and as BabelMap can already extract bitmaps for all Unicode
> characters that are mapped to a font, all it needs is for the software
> to iterate through all 109,242 graphic characters looking for matches
> for the user input glyph (about 20 minutes at 10ms a character, which
> may be a bit of a problem) ... unfortunately I have no idea how to do
> the last bit.
Fortunately, there are fast algorithms for searching within
high-dimensional feature spaces which can work many orders of magnitude
than brute-force linear-time search for this kind of problem.
There's a huge literature on applying these algorithms to character
Providing you only want to find a shortlist of a few dozen potential
matches, which can then be chosen from by eye, it shouldn't be too
difficult to code something useful, as demonstrated by how well detexify
Accurate OCR is a completely different matter.
This archive was generated by hypermail 2.1.5 : Fri Feb 18 2011 - 07:23:34 CST